Commit Graph

14499 Commits

Author SHA1 Message Date
Alvaro Herrera 0f9692b40d Fix relpersistence setting in reindex_index
Buildfarm members with CLOBBER_CACHE_ALWAYS advised us that commit
85b506bbfc was mistaken in setting the relpersistence value of the
index directly in the relcache entry, within reindex_index.  The reason
for the failure is that an invalidation message that comes after mucking
with the relcache entry directly, but before writing it to the catalogs,
would cause the entry to become rebuilt in place from catalogs with the
old contents, losing the update.

Fix by passing the correct persistence value to
RelationSetNewRelfilenode instead; this routine also writes the updated
tuple to pg_class, avoiding the problem.  Suggested by Tom Lane.
2014-11-17 11:23:35 -03:00
Peter Eisentraut 7466a1b75f Translation updates 2014-11-16 21:32:51 -05:00
Simon Riggs 0f66d21201 Emit msg re skipping ANALYZE for absent inh tree
When checking a table that has an inheritance tree marked,
if no child tables remain, we skip ANALYZE. This patch emits
a message to show that the action has been skipped.

Author: Etsuro Fujita
Reviewer: Furuya Osamu
2014-11-15 22:49:54 +00:00
Alvaro Herrera 85b506bbfc Get rid of SET LOGGED indexes persistence kludge
This removes ATChangeIndexesPersistence() introduced by f41872d0c1
which was too ugly to live for long.  Instead, the correct persistence
marking is passed all the way down to reindex_index, so that the
transient relation built to contain the index relfilenode can
get marked correctly right from the start.

Author: Fabrízio de Royes Mello
Review and editorialization by Michael Paquier
                                     and Álvaro Herrera
2014-11-15 01:19:49 -03:00
Andres Freund 98ec7fd903 Sync unlogged relations to disk after they have been reset.
Unlogged relations are only reset when performing a unclean
restart. That means they have to be synced to disk during clean
shutdowns. During normal processing that's achieved by registering a
buffer's file to be fsynced at the next checkpoint when flushed. But
ResetUnloggedRelations() doesn't go through the buffer manager, so
nothing will force reset relations to disk before the next shutdown
checkpoint.

So just make ResetUnloggedRelations() fsync the newly created main
forks to disk.

Discussion: 20140912112246.GA4984@alap3.anarazel.de

Backpatch to 9.1 where unlogged tables were introduced.

Abhijit Menon-Sen and Andres Freund
2014-11-15 01:19:31 +01:00
Andres Freund d3586fc8aa Ensure unlogged tables are reset even if crash recovery errors out.
Unlogged relations are reset at the end of crash recovery as they're
only synced to disk during a proper shutdown. Unfortunately that and
later steps can fail, e.g. due to running out of space. This reset
was, up to now performed after marking the database as having finished
crash recovery successfully. As out of space errors trigger a crash
restart that could lead to the situation that not all unlogged
relations are reset.

Once that happend usage of unlogged relations could yield errors like
"could not open file "...": No such file or directory". Luckily
clusters that show the problem can be fixed by performing a immediate
shutdown, and starting the database again.

To fix, just call ResetUnloggedRelations(UNLOGGED_RELATION_INIT)
earlier, before marking the database as having successfully recovered.

Discussion: 20140912112246.GA4984@alap3.anarazel.de

Backpatch to 9.1 where unlogged tables were introduced.

Abhijit Menon-Sen and Andres Freund
2014-11-15 01:19:26 +01:00
Stephen Frost 80eacaa3cd Clean up includes from RLS patch
The initial patch for RLS mistakenly included headers associated with
the executor and planner bits in rewrite/rowsecurity.h.  Per policy and
general good sense, executor headers should not be included in planner
headers or vice versa.

The include of execnodes.h was a mistaken holdover from previous
versions, while the include of relation.h was used for Relation's
definition, which should have been coming from utils/relcache.h.  This
patch cleans these issues up, adds comments to the RowSecurityPolicy
struct and the RowSecurityConfigType enum, and changes Relation->rsdesc
to Relation->rd_rsdesc to follow Relation field naming convention.

Additionally, utils/rel.h was including rewrite/rowsecurity.h, which
wasn't a great idea since that was pulling in things not really needed
in utils/rel.h (which gets included in quite a few places).  Instead,
use 'struct RowSecurityDesc' for the rd_rsdesc field and add comments
explaining why.

Lastly, add an include into access/nbtree/nbtsort.c for
utils/sortsupport.h, which was evidently missed due to the above mess.

Pointed out by Tom in 16970.1415838651@sss.pgh.pa.us; note that the
concerns regarding a similar situation in the custom-path commit still
need to be addressed.
2014-11-14 17:05:17 -05:00
Alvaro Herrera 51f9ea25dc Allow interrupting GetMultiXactIdMembers
This function has a loop which can lead to uninterruptible process
"stalls" (actually infinite loops) when some bugs are triggered.  Avoid
that unpleasant situation by adding a check for interrupts in a place
that shouldn't degrade performance in the normal case.

Backpatch to 9.3.  Older branches have an identical loop here, but the
aforementioned bugs are only a problem starting in 9.3 so there doesn't
seem to be any point in backpatching any further.
2014-11-14 15:14:01 -03:00
Andres Freund 0c5af0a537 Move BufferGetBlockNumber() out of heap_page_is_all_visible()'s inner loop.
In some workloads BufferGetBlockNumber() shows up in profiles due to
the sheer number of calls to it (and because it causes cache
misses). The compiler can't move it out of the loop because it's a
full extern function call...
2014-11-14 17:04:44 +01:00
Peter Eisentraut a15d387c22 Improve logical decoding log messages
suggestions from Robert Haas
2014-11-13 20:44:34 -05:00
Andres Freund 89fd41b390 Fix and improve cache invalidation logic for logical decoding.
There are basically three situations in which logical decoding needs
to perform cache invalidation. During/After replaying a transaction
with catalog changes, when skipping a uninteresting transaction that
performed catalog changes and when erroring out while replaying a
transaction. Unfortunately these three cases were all done slightly
differently - partially because 8de3e410fa, which greatly simplifies
matters, got committed in the midst of the development of logical
decoding.

The actually problematic case was when logical decoding skipped
transaction commits (and thus processed invalidations). When used via
the SQL interface cache invalidation could access the catalog - bad,
because we didn't set up enough state to allow that correctly. It'd
not be hard to setup sufficient state, but the simpler solution is to
always perform cache invalidation outside a valid transaction.

Also make the different cache invalidation cases look as similar as
possible, to ease code review.

This fixes the assertion failure reported by Antonin Houska in
53EE02D9.7040702@gmail.com. The presented testcase has been expanded
into a regression test.

Backpatch to 9.4, where logical decoding was introduced.
2014-11-13 20:34:31 +01:00
Andres Freund 5a2c184058 Fix xmin/xmax horizon computation during logical decoding initialization.
When building the initial historic catalog snapshot there were
scenarios where snapbuild.c would use incorrect xmin/xmax values when
starting from a xl_running_xacts record. The values used were always a
bit suspect, but happened to be correct in the easy to test
cases. Notably the values used when the the initial snapshot was
computed while no other transactions were running were correct.

This is likely to be the cause of the occasional buildfarm failures on
animals markhor and tick; but it's quite possible to reproduce
problems without CLOBBER_CACHE_ALWAYS.

Backpatch to 9.4, where logical decoding was introduced.
2014-11-13 20:34:30 +01:00
Heikki Linnakangas 81c4508196 Fix race condition between hot standby and restoring a full-page image.
There was a window in RestoreBackupBlock where a page would be zeroed out,
but not yet locked. If a backend pinned and locked the page in that window,
it saw the zeroed page instead of the old page or new page contents, which
could lead to missing rows in a result set, or errors.

To fix, replace RBM_ZERO with RBM_ZERO_AND_LOCK, which atomically pins,
zeroes, and locks the page, if it's not in the buffer cache already.

In stable branches, the old RBM_ZERO constant is renamed to RBM_DO_NOT_USE,
to avoid breaking any 3rd party extensions that might use RBM_ZERO. More
importantly, this avoids renumbering the other enum values, which would
cause even bigger confusion in extensions that use ReadBufferExtended, but
haven't been recompiled.

Backpatch to all supported versions; this has been racy since hot standby
was introduced.
2014-11-13 20:02:37 +02:00
Robert Haas c0828b78e9 Move the guts of our Levenshtein implementation into core.
The hope is that we can use this to produce better diagnostics in
some cases.

Peter Geoghegan, reviewed by Michael Paquier, with some further
changes by me.
2014-11-13 12:33:26 -05:00
Heikki Linnakangas 34402ae351 Fix XLogReadBufferForRedoExtended to get cleanup lock when asked to do so. 2014-11-13 17:54:20 +02:00
Fujii Masao c291503b1c Rename pending_list_cleanup_size to gin_pending_list_limit.
Since this parameter is only for GIN index, it's better to
add "gin" to the parameter name for easier understanding.
2014-11-13 12:14:48 +09:00
Tom Lane 677708032c Explicitly support the case that a plancache's raw_parse_tree is NULL.
This only happens if a client issues a Parse message with an empty query
string, which is a bit odd; but since it is explicitly called out as legal
by our FE/BE protocol spec, we'd probably better continue to allow it.

Fix by adding tests everywhere that the raw_parse_tree field is passed to
functions that don't or shouldn't accept NULL.  Also make it clear in the
relevant comments that NULL is an expected case.

This reverts commits a73c9dbab0 and
2e9650cbcf, which fixed specific crash
symptoms by hacking things at what now seems to be the wrong end, ie the
callee functions.  Making the callees allow NULL is superficially more
robust, but it's not always true that there is a defensible thing for the
callee to do in such cases.  The caller has more context and is better
able to decide what the empty-query case ought to do.

Per followup discussion of bug #11335.  Back-patch to 9.2.  The code
before that is sufficiently different that it would require development
of a separate patch, which doesn't seem worthwhile for what is believed
to be an essentially cosmetic change.
2014-11-12 15:59:01 -05:00
Andres Freund ec5896aed3 Fix several weaknesses in slot and logical replication on-disk serialization.
Heikki noticed in 544E23C0.8090605@vmware.com that slot.c and
snapbuild.c were missing the FIN_CRC32 call when computing/checking
checksums of on disk files. That doesn't lower the the error detection
capabilities of the checksum, but is inconsistent with other usages.

In a followup mail Heikki also noticed that, contrary to a comment,
the 'version' and 'length' struct fields of replication slot's on disk
data where not covered by the checksum. That's not likely to lead to
actually missed corruption as those fields are cross checked with the
expected version and the actual file length. But it's wrong
nonetheless.

As fixing these issues makes existing on disk files unreadable, bump
the expected versions of on disk files for both slots and logical
decoding historic catalog snapshots.  This means that loading old
files will fail with
ERROR: "replication slot file ... has unsupported version 1"
and
ERROR: "snapbuild state file ... has unsupported version 1 instead of
2" respectively. Given the low likelihood of anybody already using
these new features in a production setup that seems acceptable.

Fixing these issues made me notice that there's no regression test
covering the loading of historic snapshot from disk - so add one.

Backpatch to 9.4 where these features were introduced.
2014-11-12 18:52:49 +01:00
Noah Misch 28245b8424 Use just one database connection in the "tablespace" test.
On Windows, DROP TABLESPACE has a race condition when run concurrently
with other processes having opened files in the tablespace.  This led to
a rare failure on buildfarm member frogmouth.  Back-patch to 9.4, where
the reconnection was introduced.
2014-11-12 07:33:17 -05:00
Peter Eisentraut 8339f33d68 Message improvements 2014-11-11 20:02:30 -05:00
Robert Haas f1abd78be7 Remove incorrect comment.
This was introduced by commit 5ea86e6e65.

Peter Geoghegan
2014-11-11 18:41:29 -05:00
Tom Lane 2edfc021c6 Fix dependency searching for case where column is visited before table.
When the recursive search in dependency.c visits a column and then later
visits the whole table containing the column, it needs to propagate the
drop-context flags for the table to the existing target-object entry for
the column.  Otherwise we might refuse the DROP (if not CASCADE) on the
incorrect grounds that there was no automatic drop pathway to the column.
Remarkably, this has not been reported before, though it's possible at
least when an extension creates both a datatype and a table using that
datatype.

Rather than just marking the column as allowed to be dropped, it might
seem good to skip the DROP COLUMN step altogether, since the later DROP
of the table will surely get the job done.  The problem with that is that
the datatype would then be dropped before the table (since the whole
situation occurred because we visited the datatype, and then recursed to
the dependent column, before visiting the table).  That seems pretty risky,
and the case is rare enough that it doesn't seem worth expending a lot of
effort or risk to make the drops happen in a safe order.  So we just play
dumb and delete the column separately according to the existing drop
ordering rules.

Per report from Petr Jelinek, though this is different from his proposed
patch.

Back-patch to 9.1, where extensions were introduced.  There's currently
no evidence that such cases can arise before 9.1, and in any case we would
also need to back-patch cb5c2ba2d8 to 9.0
if we wanted to back-patch this.
2014-11-11 17:00:11 -05:00
Fujii Masao 1871c89202 Add generate_series(numeric, numeric).
Платон Малюгин
Reviewed by Michael Paquier, Ali Akbar and Marti Raudsepp
2014-11-11 21:44:46 +09:00
Fujii Masao a1b395b6a2 Add GUC and storage parameter to set the maximum size of GIN pending list.
Previously the maximum size of GIN pending list was controlled only by
work_mem. But the reasonable value of work_mem and the reasonable size
of the list are basically not the same, so it was not appropriate to
control both of them by only one GUC, i.e., work_mem. This commit
separates new GUC, pending_list_cleanup_size, from work_mem to allow
users to control only the size of the list.

Also this commit adds pending_list_cleanup_size as new storage parameter
to allow users to specify the size of the list per index. This is useful,
for example, when users want to increase the size of the list only for
the GIN index which can be updated heavily, and decrease it otherwise.

Reviewed by Etsuro Fujita.
2014-11-11 21:08:21 +09:00
Alvaro Herrera a590f266e4 BRIN: fix bug in xlog backup block counting
The code that generates the BRIN_XLOG_UPDATE removes the buffer
reference when the page that's target for the updated tuple is freshly
initialized.  This is a pretty usual optimization, but was breaking the
case where the revmap buffer, which is referenced in the same WAL
record, is getting a backup block: the replay code was using backup
block index 1, which is not valid when the update target buffer gets
pruned; the revmap buffer gets assigned 0 instead.  Make sure to use the
correct backup block index for revmap when replaying.

Bug reported by Fujii Masao.
2014-11-10 18:13:49 -03:00
Robert Haas c8df9477f8 Fix potential NULL-pointer dereference.
Commit 2781b4bea7 arranged to defer
the setup of after-trigger-related data structures, but
AfterTriggerPendingOnRel didn't get the memo.
2014-11-10 15:22:46 -05:00
Tom Lane bf7ca15875 Ensure that RowExprs and whole-row Vars produce the expected column names.
At one time it wasn't terribly important what column names were associated
with the fields of a composite Datum, but since the introduction of
operations like row_to_json(), it's important that looking up the rowtype
ID embedded in the Datum returns the column names that users would expect.
That did not work terribly well before this patch: you could get the column
names of the underlying table, or column aliases from any level of the
query, depending on minor details of the plan tree.  You could even get
totally empty field names, which is disastrous for cases like row_to_json().

To fix this for whole-row Vars, look to the RTE referenced by the Var, and
make sure its column aliases are applied to the rowtype associated with
the result Datums.  This is a tad scary because we might have to return
a transient RECORD type even though the Var is declared as having some
named rowtype.  In principle it should be all right because the record
type will still be physically compatible with the named rowtype; but
I had to weaken one Assert in ExecEvalConvertRowtype, and there might be
third-party code containing similar assumptions.

Similarly, RowExprs have to be willing to override the column names coming
from a named composite result type and produce a RECORD when the column
aliases visible at the site of the RowExpr differ from the underlying
table's column names.

In passing, revert the decision made in commit 398f70ec07 to add
an alias-list argument to ExecTypeFromExprList: better to provide that
functionality in a separate function.  This also reverts most of the code
changes in d685814835, which we don't need because we're no longer
depending on the tupdesc found in the child plan node's result slot to be
blessed.

Back-patch to 9.4, but not earlier, since this solution changes the results
in some cases that users might not have realized were buggy.  We'll apply a
more restricted form of this patch in older branches.
2014-11-10 15:21:09 -05:00
Alvaro Herrera 1e0b4365c2 Further code and wording tweaks in BRIN
Besides a couple of typo fixes, per David Rowley, Thom Brown, and Amit
Langote, and mentions of BRIN in the general CREATE INDEX page again per
David, this includes silencing MSVC compiler warnings (thanks Microsoft)
and an additional variable initialization per Coverity scanner.
2014-11-10 15:56:08 -03:00
Kevin Grittner 96a73fcdac Fix compiler warning for non-assert builds.
Reported by Peter Geoghegan
David Rowley
2014-11-10 09:42:46 -06:00
Alvaro Herrera b89ee54e20 Fix some coding issues in BRIN
Reported by David Rowley: variadic macros are a problem.  Get rid of
them using a trick suggested by Tom Lane: add extra parentheses where
needed.  In the future we might decide we don't need the calls at all
and remove them, but it seems appropriate to keep them while this code
is still new.

Also from David Rowley: brininsert() was trying to use a variable before
initializing it.  Fix by moving the brin_form_tuple call (which
initializes the variable) to within the locked section.

Reported by Peter Eisentraut: can't use "new" as a struct member name,
because C++ compilers will choke on it, as reported by cpluspluscheck.
2014-11-08 00:31:03 -03:00
Robert Haas 0b03e5951b Introduce custom path and scan providers.
This allows extension modules to define their own methods for
scanning a relation, and get the core code to use them.  It's
unclear as yet how much use this capability will find, but we
won't find out if we never commit it.

KaiGai Kohei, reviewed at various times and in various levels
of detail by Shigeru Hanada, Tom Lane, Andres Freund, Álvaro
Herrera, and myself.
2014-11-07 17:34:36 -05:00
Heikki Linnakangas 7250d8535b Fix building with WAL_DEBUG.
Now that the backup blocks are appended to the WAL record in xloginsert.c,
XLogInsert doesn't see them anymore and cannot remove them from the version
reconstructed for xlog_outdesc. This makes running with wal_debug=on more
expensive, as we now make (unnecessary) temporary copies of the backup
blocks, but it doesn't seem worth convoluting the code to keep that
optimization.

Reported by Alvaro Herrera.
2014-11-07 23:09:31 +02:00
Robert Haas 5ea86e6e65 Use the sortsupport infrastructure in more cases.
This removes some fmgr overhead from cases such as btree index builds.

Peter Geoghegan, reviewed by Andreas Karlsson and me.
2014-11-07 15:50:55 -05:00
Alvaro Herrera 7516f52594 BRIN: Block Range Indexes
BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes.  They work by maintaining "summary" data about
block ranges.  Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not.  Normal index scans are not supported
because these indexes do not store TIDs.

As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.

For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range.  This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results.  In this commit I only include minmax.

Catalog version bumped due to new builtin catalog entries.

There's more that could be done here, but this is a good step forwards.

Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.

Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.

PS:
  The research leading to these results has received funding from the
  European Union's Seventh Framework Programme (FP7/2007-2013) under
  grant agreement n° 318633.
2014-11-07 16:38:14 -03:00
Heikki Linnakangas 1961b1c131 Fix generation of SP-GiST vacuum WAL records.
I broke these in 8776faa81c. Backpatch to
9.4, where that was done.
2014-11-07 21:17:46 +02:00
Heikki Linnakangas 2effb72e68 Remove obsolete cases from GiST update redo code.
The code that generated a record to clear the F_TUPLES_DELETED flag hasn't
existed since we got rid of old-style VACUUM FULL. I kept the code that sets
the flag, although it's not used for anything anymore, because it might
still be interesting information for debugging purposes that some tuples
have been deleted from a page.

Likewise, the code to turn the root page from non-leaf to leaf page was
removed when we got rid of old-style VACUUM FULL. Remove the code to replay
that action, too.
2014-11-07 15:13:02 +02:00
Tom Lane d6e37b35cd Cope with more than 64K phrases in a thesaurus dictionary.
dict_thesaurus stored phrase IDs in uint16 fields, so it would get confused
and even crash if there were more than 64K entries in the configuration
file.  It turns out to be basically free to widen the phrase IDs to uint32,
so let's just do so.

This was complained of some time ago by David Boutin (in bug #7793);
he later submitted an informal patch but it was never acted on.
We now have another complaint (bug #11901 from Luc Ouellette) so it's
time to make something happen.

This is basically Boutin's patch, but for future-proofing I also added a
defense against too many words per phrase.  Note that we don't need any
explicit defense against overflow of the uint32 counters, since before that
happens we'd hit array allocation sizes that repalloc rejects.

Back-patch to all supported branches because of the crash risk.
2014-11-06 20:52:40 -05:00
Tom Lane 4875931938 Fix normalization of numeric values in JSONB GIN indexes.
The default JSONB GIN opclass (jsonb_ops) converts numeric data values
to strings for storage in the index.  It must ensure that numeric values
that would compare equal (such as 12 and 12.00) produce identical strings,
else index searches would have behavior different from regular JSONB
comparisons.  Unfortunately the function charged with doing this was
completely wrong: it could reduce distinct numeric values to the same
string, or reduce equivalent numeric values to different strings.  The
former type of error would only lead to search inefficiency, but the
latter type of error would cause index entries that should be found by
a search to not be found.

Repairing this bug therefore means that it will be necessary for 9.4 beta
testers to reindex GIN jsonb_ops indexes, if they care about getting
correct results from index searches involving numeric data values within
the comparison JSONB object.

Per report from Thomas Fanghaenel.
2014-11-06 11:41:06 -05:00
Fujii Masao 5332b8cec5 Prevent the unnecessary creation of .ready file for the timeline history file.
Previously .ready file was created for the timeline history file at the end
of an archive recovery even when WAL archiving was not enabled.
This creation is unnecessary and causes .ready file to remain infinitely.

This commit changes an archive recovery so that it creates .ready file for
the timeline history file only when WAL archiving is enabled.

Backpatch to all supported versions.
2014-11-06 21:24:40 +09:00
Heikki Linnakangas 2076db2aea Move the backup-block logic from XLogInsert to a new file, xloginsert.c.
xlog.c is huge, this makes it a little bit smaller, which is nice. Functions
related to putting together the WAL record are in xloginsert.c, and the
lower level stuff for managing WAL buffers and such are in xlog.c.

Also move the definition of XLogRecord to a separate header file. This
causes churn in the #includes of all the files that write WAL records, and
redo routines, but it avoids pulling in xlog.h into most places.

Reviewed by Michael Paquier, Alvaro Herrera, Andres Freund and Amit Kapila.
2014-11-06 13:55:36 +02:00
Fujii Masao d2b8a2c7ec Fix typo in comment.
Etsuro Fujita
2014-11-06 20:04:11 +09:00
Fujii Masao 08309aaf74 Implement IF NOT EXIST for CREATE INDEX.
Fabrízio de Royes Mello, reviewed by Marti Raudsepp, Adam Brightwell and me.
2014-11-06 18:48:33 +09:00
Tom Lane 525a489915 Remove the last vestige of server-side autocommit.
Long ago we briefly had an "autocommit" GUC that turned server-side
autocommit on and off.  That behavior was removed in 7.4 after concluding
that it broke far too much client-side logic, and making clients cope with
both behaviors was impractical.  But the GUC variable was left behind, so
as not to break any client code that might be trying to read its value.
Enough time has now passed that we should remove the GUC completely.
Whatever vestigial backwards-compatibility benefit it had is outweighed by
the risk of confusion for newbies who assume it ought to do something,
as per a recent complaint from Wolfgang Wilhelm.

In passing, adjust what seemed to me a rather confusing documentation
reference to libpq's autocommit behavior.  libpq as such knows nothing
about autocommit, so psql is probably what was meant.
2014-11-05 19:35:23 -05:00
Robert Haas c30be9787b Fix thinko in commit 2bd9e412f9.
Obviously, every translation unit should not be declaring this
separately.  It needs to be PGDLLIMPORT as well, to avoid breaking
third-party code that uses any of the functions that the commit
mentioned above changed to macros.
2014-11-05 17:12:23 -05:00
Tom Lane 465d7e1882 Make CREATE TYPE print warnings if a datatype's I/O functions are volatile.
This is a followup to commit 43ac12c6e6,
which added regression tests checking that I/O functions of built-in
types are not marked volatile.  Complaining in CREATE TYPE should push
developers of add-on types to fix any misdeclared functions in their
types.  It's just a warning not an error, to avoid creating upgrade
problems for what might be just cosmetic mis-markings.

Aside from adding the warning code, fix a number of types that were
sloppily created in the regression tests.
2014-11-05 11:44:06 -05:00
Tom Lane 33f80f8480 Drop no-longer-needed buffers during ALTER DATABASE SET TABLESPACE.
The previous coding assumed that we could just let buffers for the
database's old tablespace age out of the buffer arena naturally.
The folly of that is exposed by bug #11867 from Marc Munro: the user could
later move the database back to its original tablespace, after which any
still-surviving buffers would match lookups again and appear to contain
valid data.  But they'd be missing any changes applied while the database
was in the new tablespace.

This has been broken since ALTER SET TABLESPACE was introduced, so
back-patch to all supported branches.
2014-11-04 13:24:06 -05:00
Heikki Linnakangas 5028f22f6e Switch to CRC-32C in WAL and other places.
The old algorithm was found to not be the usual CRC-32 algorithm, used by
Ethernet et al. We were using a non-reflected lookup table with code meant
for a reflected lookup table. That's a strange combination that AFAICS does
not correspond to any bit-wise CRC calculation, which makes it difficult to
reason about its properties. Although it has worked well in practice, seems
safer to use a well-known algorithm.

Since we're changing the algorithm anyway, we might as well choose a
different polynomial. The Castagnoli polynomial has better error-correcting
properties than the traditional CRC-32 polynomial, even if we had
implemented it correctly. Another reason for picking that is that some new
CPUs have hardware support for calculating CRC-32C, but not CRC-32, let
alone our strange variant of it. This patch doesn't add any support for such
hardware, but a future patch could now do that.

The old algorithm is kept around for tsquery and pg_trgm, which use the
values in indexes that need to remain compatible so that pg_upgrade works.
While we're at it, share the old lookup table for CRC-32 calculation
between hstore, ltree and core. They all use the same table, so might as
well.
2014-11-04 11:39:48 +02:00
Robert Haas 2bd9e412f9 Support frontend-backend protocol communication using a shm_mq.
A background worker can use pq_redirect_to_shm_mq() to direct protocol
that would normally be sent to the frontend to a shm_mq so that another
process may read them.

The receiving process may use pq_parse_errornotice() to parse an
ErrorResponse or NoticeResponse from the background worker and, if
it wishes, ThrowErrorData() to propagate the error (with or without
further modification).

Patch by me.  Review by Andres Freund.
2014-10-31 12:02:40 -04:00
Robert Haas f7102b0463 Extend dsm API with a new function dsm_unpin_mapping.
This reassociates a dynamic shared memory handle previous passed to
dsm_pin_mapping with the current resource owner, so that it will be
cleaned up at the end of the current query.

Patch by me.  Review of the function name by Andres Freund, Amit
Kapila, Jim Nasby, Petr Jelinek, and Álvaro Herrera.
2014-10-30 14:55:23 -04:00
Tom Lane fd0f651a86 Test IsInTransactionChain, not IsTransactionBlock, in vac_update_relstats.
As noted by Noah Misch, my initial cut at fixing bug #11638 didn't cover
all cases where ANALYZE might be invoked in an unsafe context.  We need to
test the result of IsInTransactionChain not IsTransactionBlock; which is
notationally a pain because IsInTransactionChain requires an isTopLevel
flag, which would have to be passed down through several levels of callers.
I chose to pass in_outer_xact (ie, the result of IsInTransactionChain)
rather than isTopLevel per se, as that seemed marginally more apropos
for the intermediate functions to know about.
2014-10-30 13:04:06 -04:00
Robert Haas 6057c212f3 "Pin", rather than "keep", dynamic shared memory mappings and segments.
Nobody seemed concerned about this naming when it originally went in,
but there's a pending patch that implements the opposite of
dsm_keep_mapping, and the term "unkeep" was judged unpalatable.
"unpin" has existing precedent in the PostgreSQL code base, and the
English language, so use this terminology instead.

Per discussion, back-patch to 9.4.
2014-10-30 11:35:55 -04:00
Tom Lane e0722d9cb5 Avoid corrupting tables when ANALYZE inside a transaction is rolled back.
VACUUM and ANALYZE update the target table's pg_class row in-place, that is
nontransactionally.  This is OK, more or less, for the statistical columns,
which are mostly nontransactional anyhow.  It's not so OK for the DDL hint
flags (relhasindex etc), which might get changed in response to
transactional changes that could still be rolled back.  This isn't a
problem for VACUUM, since it can't be run inside a transaction block nor
in parallel with DDL on the table.  However, we allow ANALYZE inside a
transaction block, so if the transaction had earlier removed the last
index, rule, or trigger from the table, and then we roll back the
transaction after ANALYZE, the table would be left in a corrupted state
with the hint flags not set though they should be.

To fix, suppress the hint-flag updates if we are InTransactionBlock().
This is safe enough because it's always OK to postpone hint maintenance
some more; the worst-case consequence is a few extra searches of pg_index
et al.  There was discussion of instead using a transactional update,
but that would change the behavior in ways that are not all desirable:
in most scenarios we're better off keeping ANALYZE's statistical values
even if the ANALYZE itself rolls back.  In any case we probably don't want
to change this behavior in back branches.

Per bug #11638 from Casey Shobe.  This has been broken for a good long
time, so back-patch to all supported branches.

Tom Lane and Michael Paquier, initial diagnosis by Andres Freund
2014-10-29 18:12:02 -04:00
Robert Haas 6cb4afff33 Avoid setup work for invalidation messages at start-of-(sub)xact.
Instead of initializing a new TransInvalidationInfo for every
transaction or subtransaction, we can just do it for those
transactions or subtransactions that actually need to queue
invalidation messages.  That also avoids needing to free those
entries at the end of a transaction or subtransaction that does
not generate any invalidation messages, which is by far the
common case.

Patch by me.  Review by Simon Riggs and Andres Freund.
2014-10-29 12:35:19 -04:00
Tom Lane a00d468e65 Remove obsolete commentary.
Since we got rid of non-MVCC catalog scans, the fourth reason given for
using a non-transactional update in index_update_stats() is obsolete.
The other three are still good, so we're not going to change the code,
but fix the comment.
2014-10-28 18:36:02 -04:00
Heikki Linnakangas 18f158ef69 Remove unnecessary assignment.
Reported by MauMau.
2014-10-28 20:26:20 +02:00
Heikki Linnakangas 22926e00f7 Fix two bugs in tsquery @> operator.
1. The comparison for matching terms used only the CRC to decide if there's
a match. Two different terms with the same CRC gave a match.

2. It assumed that if the second operand has more terms than the first, it's
never a match. That assumption is bogus, because there can be duplicate
terms in either operand.

Rewrite the implementation in a way that doesn't have those bugs.

Backpatch to all supported versions.
2014-10-27 10:50:41 +02:00
Tom Lane a4523c5aa5 Improve planning of btree index scans using ScalarArrayOpExpr quals.
Since we taught btree to handle ScalarArrayOpExpr quals natively (commit
9e8da0f757), the planner has always included
ScalarArrayOpExpr quals in index conditions if possible.  However, if the
qual is for a non-first index column, this could result in an inferior plan
because we can no longer take advantage of index ordering (cf. commit
807a40c551).  It can be better to omit the
ScalarArrayOpExpr qual from the index condition and let it be done as a
filter, so that the output doesn't need to get sorted.  Indeed, this is
true for the query introduced as a test case by the latter commit.

To fix, restructure get_index_paths and build_index_paths so that we
consider paths both with and without ScalarArrayOpExpr quals in non-first
index columns.  Redesign the API of build_index_paths so that it reports
what it found, saving useless second or third calls.

Report and patch by Andrew Gierth (though rather heavily modified by me).
Back-patch to 9.2 where this code was introduced, since the issue can
result in significant performance regressions compared to plans produced
by 9.1 and earlier.
2014-10-26 16:12:22 -04:00
Robert Haas 85bb81de53 Fix off-by-one error in 2781b4bea7.
Spotted by Tom Lane.
2014-10-24 08:18:28 -04:00
Alvaro Herrera b01a4f6838 Update README.tuplock
This file was documenting an older version of patch 0ac5ad5134; update
it to match what was really committed

Author: Florian Pflug
2014-10-23 20:51:58 -03:00
Tom Lane b34d6f03db Improve ispell dictionary's defenses against bad affix files.
Don't crash if an ispell dictionary definition contains flags but not
any compound affixes.  (This isn't a security issue since only superusers
can install affix files, but still it's a bad thing.)

Also, be more careful about detecting whether an affix-file FLAG command
is old-format (ispell) or new-format (myspell/hunspell).  And change the
error message about mixed old-format and new-format commands into something
intelligible.

Per bug #11770 from Emre Hasegeli.  Back-patch to all supported branches.
2014-10-23 13:12:00 -04:00
Robert Haas 2781b4bea7 Perform less setup work for AFTER triggers at transaction start.
Testing reveals that the memory allocation we do at transaction start
has small but measurable overhead on simple transactions.  To cut
down on that overhead, defer some of that work to the point when
AFTER triggers are first used, thus avoiding it altogether if they
never are.

Patch by me.  Review by Andres Freund.
2014-10-23 12:33:02 -04:00
Robert Haas 5ac372fc1a Add a function to get the authenticated user ID.
Previously, this was not exposed outside of miscinit.c.  It is needed
for the pending pg_background patch, and will also be needed for
parallelism.  Without it, there's no way for a background worker to
re-create the exact authentication environment that was present in the
process that started it, which could lead to security exposures.
2014-10-23 08:18:45 -04:00
Fujii Masao c7371c4a60 Prevent the already-archived WAL file from being archived again.
Previously the archive recovery always created .ready file for
the last WAL file of the old timeline at the end of recovery even when
it's restored from the archive and has .done file. That is, there was
the case where the WAL file had both .ready and .done files.
This caused the already-archived WAL file to be archived again.

This commit prevents the archive recovery from creating .ready file
for the last WAL file if it has .done file, in order to prevent it from
being archived again.

This bug was added when cascading replication feature was introduced,
i.e., the commit 5286105800.
So, back-patch to 9.2, where cascading replication was added.

Reviewed by Michael Paquier
2014-10-23 16:21:27 +09:00
Peter Eisentraut e64d3c5635 Minimize calls of pg_class_aclcheck to minimum necessary
In a couple of code paths, pg_class_aclcheck is called in succession
with multiple different modes set.  This patch combines those modes to
have a single call of this function and reduce a bit process overhead
for permission checking.

Author: Michael Paquier <michael@otacoo.com>
Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com>
2014-10-22 21:41:43 -04:00
Heikki Linnakangas 98b3743779 Update comment.
The _bt_tuplecompare() function mentioned in comment hasn't existed for a
long time.

Peter Geoghegan
2014-10-22 15:44:07 +03:00
Peter Eisentraut 6f04368cfc Allow input format xxxx-xxxx-xxxx for macaddr type
Author: Herwin Weststrate <herwin@quarantainenet.nl>
Reviewed-by: Ali Akbar <the.apaan@gmail.com>
2014-10-21 16:16:39 -04:00
Andres Freund 5e5b65f359 Don't duplicate log_checkpoint messages for both of restart and checkpoints.
The duplication originated in cdd46c765, where restartpoints were
introduced.

In LogCheckpointStart's case the duplication actually lead to the
compiler's format string checking not to be effective because the
format string wasn't constant.

Arguably these messages shouldn't be elog(), but ereport() style
messages. That'd even allow to translate the messages... But as
there's more mistakes of that kind in surrounding code, it seems
better to change that separately.
2014-10-21 01:01:56 +02:00
Andres Freund 11abd6c90f Renumber CHECKPOINT_* flags.
Commit 7dbb606938 added a new CHECKPOINT_FLUSH_ALL flag. As that
commit needed to be backpatched I didn't change the numeric values of
the existing flags as that could lead to nastly problems if any
external code issued checkpoints. That's not a concern on master, so
renumber them there.

Also add a comment about CHECKPOINT_FLUSH_ALL above
CreateCheckPoint().
2014-10-21 00:20:08 +02:00
Andres Freund 7dbb606938 Flush unlogged table's buffers when copying or moving databases.
CREATE DATABASE and ALTER DATABASE .. SET TABLESPACE copy the source
database directory on the filesystem level. To ensure the on disk
state is consistent they block out users of the affected database and
force a checkpoint to flush out all data to disk. Unfortunately, up to
now, that checkpoint didn't flush out dirty buffers from unlogged
relations.

That bug means there could be leftover dirty buffers in either the
template database, or the database in its old location. Leading to
problems when accessing relations in an inconsistent state; and to
possible problems during shutdown in the SET TABLESPACE case because
buffers belonging files that don't exist anymore are flushed.

This was reported in bug #10675 by Maxim Boguk.

Fix by Pavan Deolasee, modified somewhat by me. Reviewed by MauMau and
Fujii Masao.

Backpatch to 9.1 where unlogged tables were introduced.
2014-10-20 23:43:46 +02:00
Tom Lane f330a6d140 Fix mishandling of FieldSelect-on-whole-row-Var in nested lateral queries.
If an inline-able SQL function taking a composite argument is used in a
LATERAL subselect, and the composite argument is a lateral reference,
the planner could fail with "variable not found in subplan target list",
as seen in bug #11703 from Karl Bartel.  (The outer function call used in
the bug report and in the committed regression test is not really necessary
to provoke the bug --- you can get it if you manually expand the outer
function into "LATERAL (SELECT inner_function(outer_relation))", too.)

The cause of this is that we generate the reltargetlist for the referenced
relation before doing eval_const_expressions() on the lateral sub-select's
expressions (cf find_lateral_references()), so what's scheduled to be
emitted by the referenced relation is a whole-row Var, not the simplified
single-column Var produced by optimizing the function's FieldSelect on the
whole-row Var.  Then setrefs.c fails to match up that lateral reference to
what's available from the outer scan.

Preserving the FieldSelect optimization in such cases would require either
major planner restructuring (to recursively do expression simplification
on sub-selects much earlier) or some amazingly ugly kluge to change the
reltargetlist of a possibly-already-planned relation.  It seems better
just to skip the optimization when the Var is from an upper query level;
the case is not so common that it's likely anyone will notice a few
wasted cycles.

AFAICT this problem only occurs for uplevel LATERAL references, so
back-patch to 9.3 where LATERAL was added.
2014-10-20 12:23:42 -04:00
Robert Haas bc279c92f0 Fix typos.
David Rowley
2014-10-20 10:33:16 -04:00
Robert Haas 0f565c0743 Fix typos.
Etsuro Fujita
2014-10-20 10:23:40 -04:00
Peter Eisentraut 7feaccc217 Allow setting effective_io_concurrency even on unsupported systems
This matches the behavior of other parameters that are unsupported on
some systems (e.g., ssl).

Also document the default value.
2014-10-18 21:35:46 -04:00
Bruce Momjian b87671f1b6 Shorten warning about hash creation
Also document that PITR is also affected.
2014-10-18 10:36:09 -04:00
Bruce Momjian 417f92484d interval: tighten precision specification
interval precision can only be specified after the "interval" keyword if
no units are specified.

Previously we incorrectly checked the units to see if the precision was
legal, causing confusion.

Report by Alvaro Herrera
2014-10-18 10:31:00 -04:00
Tom Lane 5ba062ee44 Avoid core dump in _outPathInfo() for Path without a parent RelOptInfo.
Nearly all Paths have parents, but a ResultPath representing an empty FROM
clause does not.  Avoid a core dump in such cases.  I believe this is only
a hazard for debugging usage, not for production, else we'd have heard
about it before.  Nonetheless, back-patch to 9.1 where the troublesome code
was introduced.  Noted while poking at bug #11703.
2014-10-17 22:33:52 -04:00
Tom Lane b2cbced9ee Support timezone abbreviations that sometimes change.
Up to now, PG has assumed that any given timezone abbreviation (such as
"EDT") represents a constant GMT offset in the usage of any particular
region; we had a way to configure what that offset was, but not for it
to be changeable over time.  But, as with most things horological, this
view of the world is too simplistic: there are numerous regions that have
at one time or another switched to a different GMT offset but kept using
the same timezone abbreviation.  Almost the entire Russian Federation did
that a few years ago, and later this month they're going to do it again.
And there are similar examples all over the world.

To cope with this, invent the notion of a "dynamic timezone abbreviation",
which is one that is referenced to a particular underlying timezone
(as defined in the IANA timezone database) and means whatever it currently
means in that zone.  For zones that use or have used daylight-savings time,
the standard and DST abbreviations continue to have the property that you
can specify standard or DST time and get that time offset whether or not
DST was theoretically in effect at the time.  However, the abbreviations
mean what they meant at the time in question (or most recently before that
time) rather than being absolutely fixed.

The standard abbreviation-list files have been changed to use this behavior
for abbreviations that have actually varied in meaning since 1970.  The
old simple-numeric definitions are kept for abbreviations that have not
changed, since they are a bit faster to resolve.

While this is clearly a new feature, it seems necessary to back-patch it
into all active branches, because otherwise use of Russian zone
abbreviations is going to become even more problematic than it already was.
This change supersedes the changes in commit 513d06ded et al to modify the
fixed meanings of the Russian abbreviations; since we've not shipped that
yet, this will avoid an undesirably incompatible (not to mention incorrect)
change in behavior for timestamps between 2011 and 2014.

This patch makes some cosmetic changes in ecpglib to keep its usage of
datetime lookup tables as similar as possible to the backend code, but
doesn't do anything about the increasingly obsolete set of timezone
abbreviation definitions that are hard-wired into ecpglib.  Whatever we
do about that will likely not be appropriate material for back-patching.
Also, a potential free() of a garbage pointer after an out-of-memory
failure in ecpglib has been fixed.

This patch also fixes pre-existing bugs in DetermineTimeZoneOffset() that
caused it to produce unexpected results near a timezone transition, if
both the "before" and "after" states are marked as standard time.  We'd
only ever thought about or tested transitions between standard and DST
time, but that's not what's happening when a zone simply redefines their
base GMT offset.

In passing, update the SGML documentation to refer to the Olson/zoneinfo/
zic timezone database as the "IANA" database, since it's now being
maintained under the auspices of IANA.
2014-10-16 15:22:10 -04:00
Tom Lane 90063a7612 Print planning time only in EXPLAIN ANALYZE, not plain EXPLAIN.
We've gotten enough push-back on that change to make it clear that it
wasn't an especially good idea to do it like that.  Revert plain EXPLAIN
to its previous behavior, but keep the extra output in EXPLAIN ANALYZE.
Per discussion.

Internally, I set this up as a separate flag ExplainState.summary that
controls printing of planning time and execution time.  For now it's
just copied from the ANALYZE option, but we could consider exposing it
to users.
2014-10-15 18:50:13 -04:00
Heikki Linnakangas e0d97d77bf Fix deadlock with LWLockAcquireWithVar and LWLockWaitForVar.
LWLockRelease should release all backends waiting with LWLockWaitForVar,
even when another backend has already been woken up to acquire the lock,
i.e. when releaseOK is false. LWLockWaitForVar can return as soon as the
protected value changes, even if the other backend will acquire the lock.
Fix that by resetting releaseOK to true in LWLockWaitForVar, whenever
adding itself to the wait queue.

This should fix the bug reported by MauMau, where the system occasionally
hangs when there is a lot of concurrent WAL activity and a checkpoint.
Backpatch to 9.4, where this code was added.
2014-10-14 10:06:47 +03:00
Bruce Momjian fe65280fea C comments: adjust execTuples.c for new structure
Report by Peter Geoghegan
2014-10-13 16:54:38 -04:00
Bruce Momjian d6c8e8d7ac Consistently use NULL for invalid GUC unit strings
Patch by Euler Taveira
2014-10-13 16:11:43 -04:00
Kevin Grittner 30d7ae3c76 Increase number of hash join buckets for underestimate.
If we expect batching at the very beginning, we size nbuckets for
"full work_mem" (see how many tuples we can get into work_mem,
while not breaking NTUP_PER_BUCKET threshold).

If we expect to be fine without batching, we start with the 'right'
nbuckets and track the optimal nbuckets as we go (without actually
resizing the hash table). Once we hit work_mem (considering the
optimal nbuckets value), we keep the value.

At the end of the first batch, we check whether (nbuckets !=
nbuckets_optimal) and resize the hash table if needed. Also, we
keep this value for all batches (it's OK because it assumes full
work_mem, and it makes the batchno evaluation trivial). So the
resize happens only once.

There could be cases where it would improve performance to allow
the NTUP_PER_BUCKET threshold to be exceeded to keep everything in
one batch rather than spilling to a second batch, but attempts to
generate such a case have so far been unsuccessful; that issue may
be addressed with a follow-on patch after further investigation.

Tomas Vondra with minor format and comment cleanup by me
Reviewed by Robert Haas, Heikki Linnakangas, and Kevin Grittner
2014-10-13 10:16:36 -05:00
Peter Eisentraut b7a08c8028 Message improvements 2014-10-12 01:06:35 -04:00
Tom Lane 4a50de1312 Fix bogus optimization in JSONB containment tests.
When determining whether one JSONB object contains another, it's okay to
make a quick exit if the first object has fewer pairs than the second:
because we de-duplicate keys within objects, it is impossible that the
first object has all the keys the second does.  However, the code was
applying this rule to JSONB arrays as well, where it does *not* hold
because arrays can contain duplicate entries.  The test was really in
the wrong place anyway; we should do it within JsonbDeepContains, where
it can be applied to nested objects not only top-level ones.

Report and test cases by Alexander Korotkov; fix by Peter Geoghegan and
Tom Lane.
2014-10-11 14:13:51 -04:00
Alvaro Herrera 7b1c2a0f20 Split builtins.h to a new header ruleutils.h
The new header contains many prototypes for functions in ruleutils.c
that are not exposed to the SQL level.

Reviewed by Andres Freund and Michael Paquier.
2014-10-08 18:10:47 -03:00
Robert Haas 7bb0e97407 Extend shm_mq API with new functions shm_mq_sendv, shm_mq_set_handle.
shm_mq_sendv sends a message to the queue assembled from multiple
locations.  This is expected to be used by forthcoming patches to
allow frontend/backend protocol messages to be sent via shm_mq, but
might be useful for other purposes as well.

shm_mq_set_handle associates a BackgroundWorkerHandle with an
already-existing shm_mq_handle.  This solves a timing problem when
creating a shm_mq to communicate with a newly-launched background
worker: if you attach to the queue first, and the background worker
fails to start, you might block forever trying to do I/O on the queue;
but if you start the background worker first, but then die before
attaching to the queue, the background worrker might block forever
trying to do I/O on the queue.  This lets you attach before starting
the worker (so that the worker is protected) and then associate the
BackgroundWorkerHandle later (so that you are also protected).

Patch by me, reviewed by Stephen Frost.
2014-10-08 14:38:31 -04:00
Alvaro Herrera df630b0dd5 Implement SKIP LOCKED for row-level locks
This clause changes the behavior of SELECT locking clauses in the
presence of locked rows: instead of causing a process to block waiting
for the locks held by other processes (or raise an error, with NOWAIT),
SKIP LOCKED makes the new reader skip over such rows.  While this is not
appropriate behavior for general purposes, there are some cases in which
it is useful, such as queue-like tables.

Catalog version bumped because this patch changes the representation of
stored rules.

Reviewed by Craig Ringer (based on a previous attempt at an
implementation by Simon Riggs, who also provided input on the syntax
used in the current patch), David Rowley, and Álvaro Herrera.

Author: Thomas Munro
2014-10-07 17:23:34 -03:00
Robert Haas c421efd213 Fix typo in elog message. 2014-10-07 00:08:59 -04:00
Peter Eisentraut 1ec4a970fe Translation updates 2014-10-05 23:23:50 -04:00
Robert Haas d0410d6603 Eliminate one background-worker-related flag variable.
Teach sigusr1_handler() to use the same test for whether a worker
might need to be started as ServerLoop().  Aside from being perhaps
a bit simpler, this prevents a potentially-unbounded delay when
starting a background worker.  On some platforms, select() doesn't
return when interrupted by a signal, but is instead restarted,
including a reset of the timeout to the originally-requested value.
If signals arrive often enough, but no connection requests arrive,
sigusr1_handler() will be executed repeatedly, but the body of
ServerLoop() won't be reached.  This change ensures that, even in
that case, background workers will eventually get launched.

This is far from a perfect fix; really, we need select() to return
control to ServerLoop() after an interrupt, either via the self-pipe
trick or some other mechanism.  But that's going to require more
work and discussion, so let's do this for now to at least mitigate
the damage.

Per investigation of test_shm_mq failures on buildfarm member anole.
2014-10-04 21:25:41 -04:00
Tom Lane 513d06ded1 Update time zone data files to tzdata release 2014h.
Most zones in the Russian Federation are subtracting one or two hours
as of 2014-10-26.  Update the meanings of the abbreviations IRKT, KRAT,
MAGT, MSK, NOVT, OMST, SAKT, VLAT, YAKT, YEKT to match.

The IANA timezone database has adopted abbreviations of the form AxST/AxDT
for all Australian time zones, reflecting what they believe to be current
majority practice Down Under.  These names do not conflict with usage
elsewhere (other than ACST for Acre Summer Time, which has been in disuse
since 1994).  Accordingly, adopt these names into our "Default" timezone
abbreviation set.  The "Australia" abbreviation set now contains only
CST,EAST,EST,SAST,SAT,WST, all of which are thought to be mostly historical
usage.  Note that SAST has also been changed to be South Africa Standard
Time in the "Default" abbreviation set.

Add zone abbreviations SRET (Asia/Srednekolymsk) and XJT (Asia/Urumqi),
and use WSST/WSDT for western Samoa.

Also a DST law change in the Turks & Caicos Islands (America/Grand_Turk),
and numerous corrections for historical time zone data.
2014-10-04 14:18:19 -04:00
Stephen Frost 78d72563ef Fix CreatePolicy, pg_dump -v; psql and doc updates
Peter G pointed out that valgrind was, rightfully, complaining about
CreatePolicy() ending up copying beyond the end of the parsed policy
name.  Name is a fixed-size type and we need to use namein (through
DirectFunctionCall1()) to flush out the entire array before we pass
it down to heap_form_tuple.

Michael Paquier pointed out that pg_dump --verbose was missing a
newline and Fabrízio de Royes Mello further pointed out that the
schema was also missing from the messages, so fix those also.

Also, based on an off-list comment from Kevin, rework the psql \d
output to facilitate copy/pasting into a new CREATE or ALTER POLICY
command.

Lastly, improve the pg_policies view and update the documentation for
it, along with a few other minor doc corrections based on an off-list
discussion with Adam Brightwell.
2014-10-03 16:31:53 -04:00
Alvaro Herrera 1021bd6a89 Don't balance vacuum cost delay when per-table settings are in effect
When there are cost-delay-related storage options set for a table,
trying to make that table participate in the autovacuum cost-limit
balancing algorithm produces undesirable results: instead of using the
configured values, the global values are always used,
as illustrated by Mark Kirkwood in
http://www.postgresql.org/message-id/52FACF15.8020507@catalyst.net.nz

Since the mechanism is already complicated, just disable it for those
cases rather than trying to make it cope.  There are undesirable
side-effects from this too, namely that the total I/O impact on the
system will be higher whenever such tables are vacuumed.  However, this
is seen as less harmful than slowing down vacuum, because that would
cause bloat to accumulate.  Anyway, in the new system it is possible to
tweak options to get the precise behavior one wants, whereas with the
previous system one was simply hosed.

This has been broken forever, so backpatch to all supported branches.
This might affect systems where cost_limit and cost_delay have been set
for individual tables.
2014-10-03 13:01:27 -03:00
Robert Haas 017b2e9822 Fix typos in comments.
Etsuro Fujita
2014-10-03 11:47:27 -04:00
Heikki Linnakangas 7690ddea0c Check for GiST index tuples that don't fit on a page.
The page splitting code would go into infinite recursion if you try to
insert an index tuple that doesn't fit even on an empty page.

Per analysis and suggested fix by Andrew Gierth. Fixes bug #11555, reported
by Bryan Seitz (analysis happened over IRC). Backpatch to all supported
versions.
2014-10-03 14:49:52 +03:00
Robert Haas 3acc10c997 Increase the number of buffer mapping partitions to 128.
Testing by Amit Kapila, Andres Freund, and myself, with and without
other patches that also aim to improve scalability, seems to indicate
that this change is a significant win over the current value and over
smaller values such as 64.  It's not clear how high we can push this
value before it starts to have negative side-effects elsewhere, but
going this far looks OK.
2014-10-02 13:58:50 -04:00
Tom Lane 5a6c168c78 Fix some more problems with nested append relations.
As of commit a87c72915 (which later got backpatched as far as 9.1),
we're explicitly supporting the notion that append relations can be
nested; this can occur when UNION ALL constructs are nested, or when
a UNION ALL contains a table with inheritance children.

Bug #11457 from Nelson Page, as well as an earlier report from Elvis
Pranskevichus, showed that there were still nasty bugs associated with such
cases: in particular the EquivalenceClass mechanism could try to generate
"join" clauses connecting an appendrel child to some grandparent appendrel,
which would result in assertion failures or bogus plans.

Upon investigation I concluded that all current callers of
find_childrel_appendrelinfo() need to be fixed to explicitly consider
multiple levels of parent appendrels.  The most complex fix was in
processing of "broken" EquivalenceClasses, which are ECs for which we have
been unable to generate all the derived equality clauses we would like to
because of missing cross-type equality operators in the underlying btree
operator family.  That code path is more or less entirely untested by
the regression tests to date, because no standard opfamilies have such
holes in them.  So I wrote a new regression test script to try to exercise
it a bit, which turned out to be quite a worthwhile activity as it exposed
existing bugs in all supported branches.

The present patch is essentially the same as far back as 9.2, which is
where parameterized paths were introduced.  In 9.0 and 9.1, we only need
to back-patch a small fragment of commit 5b7b5518d, which fixes failure to
propagate out the original WHERE clauses when a broken EC contains constant
members.  (The regression test case results show that these older branches
are noticeably stupider than 9.2+ in terms of the quality of the plans
generated; but we don't really care about plan quality in such cases,
only that the plan not be outright wrong.  A more invasive fix in the
older branches would not be a good idea anyway from a plan-stability
standpoint.)
2014-10-01 19:31:12 -04:00
Heikki Linnakangas 5fa6c81a43 Remove num_xloginsert_locks GUC, replace with a #define
I left the GUC in place for the beta period, so that people could experiment
with different values. No-one's come up with any data that a different value
would be better under some circumstances, so rather than try to document to
users what the GUC, let's just hard-code the current value, 8.
2014-10-01 16:42:26 +03:00
Andres Freund a39e78b710 Block signals while computing the sleep time in postmaster's main loop.
DetermineSleepTime() was previously called without blocked
signals. That's not good, because it allows signal handlers to
interrupt its workings.

DetermineSleepTime() was added in 9.3 with the addition of background
workers (da07a1e856), where it only read from
BackgroundWorkerList.

Since 9.4, where dynamic background workers were added (7f7485a0cd),
the list is also manipulated in DetermineSleepTime(). That's bad
because the list now can be persistently corrupted if modified by both
a signal handler and DetermineSleepTime().

This was discovered during the investigation of hangs on buildfarm
member anole. It's unclear whether this bug is the source of these
hangs or not, but it's worth fixing either way. I have confirmed that
it can cause crashes.

It luckily looks like this only can cause problems when bgworkers are
actively used.

Discussion: 20140929193733.GB14400@awork2.anarazel.de

Backpatch to 9.3 where background workers were introduced.
2014-10-01 15:19:40 +02:00
Andres Freund 0ef3c29a4b Improve documentation about binary/textual output mode for output plugins.
Also improve related error message as it contributed to the confusion.

Discussion: CAB7nPqQrqFzjqCjxu4GZzTrD9kpj6HMn9G5aOOMwt1WZ8NfqeA@mail.gmail.com,
    CAB7nPqQXc_+g95zWnqaa=mVQ4d3BVRs6T41frcEYi2ocUrR3+A@mail.gmail.com

Per discussion between Michael Paquier, Robert Haas and Andres Freund

Backpatch to 9.4 where logical decoding was introduced.
2014-10-01 13:22:17 +02:00