Commit Graph

24039 Commits

Author SHA1 Message Date
Peter Eisentraut fc8745070a PL/Python: Make build on OS X more flexible
The PL/Python build on OS X was previously hardcoded to use the system
installation of Python, ignoring whatever was specified to configure.
Except that it would use the header files from configure, which could
lead to mismatches.  It was not possible to build against a custom
Python installation.

Now, we check in configure how the specified Python installation was
built and use that, supporting framework and non-framework builds.
2013-01-05 08:56:14 -05:00
Peter Eisentraut 7e938e3c56 Revert "PL/Python: Remove workaround for returning booleans in Python <2.3"
This reverts commit be0dfbad36.

The previous information that Py_RETURN_TRUE and Py_RETURN_FALSE are
supported in Python 2.3 is wrong.  They require Python 2.4.  Update the
comment about that.
2013-01-05 08:50:58 -05:00
Peter Eisentraut 49e7a26d67 Make some spelling more consistent 2013-01-05 08:25:21 -05:00
Tom Lane 94afbd5831 Invent a "one-shot" variant of CachedPlans for better performance.
SPI_execute() and related functions create a CachedPlan, execute it once,
and immediately discard it, so that the functionality offered by
plancache.c is of no value in this code path.  And performance measurements
show that the extra data copying and invalidation checking done by
plancache.c slows down simple queries by 10% or more compared to 9.1.
However, enough of the SPI code is shared with functions that do need plan
caching that it seems impractical to bypass plancache.c altogether.
Instead, let's invent a variant version of cached plans that preserves
99% of the API but doesn't offer any of the actual functionality, nor the
overhead.  This puts SPI_execute() performance back on par, or maybe even
slightly better, than it was before.  This change should resolve recent
complaints of performance degradation from Dong Ye, Pavel Stehule, and
others.

By avoiding data copying, this change also reduces the amount of memory
needed to execute many-statement SPI_execute() strings, as for instance in
a recent complaint from Tomas Vondra.

An additional benefit of this change is that multi-statement SPI_execute()
query strings are now processed fully serially, that is we complete
execution of earlier statements before running parse analysis and planning
on following ones.  This eliminates a long-standing POLA violation, in that
DDL that affects the behavior of a later statement will now behave as
expected.

Back-patch to 9.2, since this was a performance regression compared to 9.1.
(In 9.2, place the added struct fields so as to avoid changing the offsets
of existing fields.)

Heikki Linnakangas and Tom Lane
2013-01-04 17:42:19 -05:00
Heikki Linnakangas b0daba57bb Tolerate timeline switches while "pg_basebackup -X fetch" is running.
If you take a base backup from a standby server with "pg_basebackup -X
fetch", and the timeline switches while the backup is being taken, the
backup used to fail with an error "requested WAL segment %s has already
been removed". This is because the server-side code that sends over the
required WAL files would not construct the WAL filename with the correct
timeline after a switch.

Fix that by using readdir() to scan pg_xlog for all the WAL segments in the
range, regardless of timeline.

Also, include all timeline history files in the backup, if taken with
"-X fetch". That fixes another related bug: If a timeline switch happened
just before the backup was initiated in a standby, the WAL segment
containing the initial checkpoint record contains WAL from the older
timeline too. Recovery will not accept that without a timeline history file
that lists the older timeline.

Backpatch to 9.2. Versions prior to that were not affected as you could not
take a base backup from a standby before 9.2.
2013-01-03 19:51:00 +02:00
Heikki Linnakangas ee994272ca Delay reading timeline history file until it's fetched from master.
Streaming replication can fetch any missing timeline history files from the
master, but recovery would read the timeline history file for the target
timeline before reading the checkpoint record, and before walreceiver has
had a chance to fetch it from the master. Delay reading it, and the sanity
checks involving timeline history, until after reading the checkpoint
record.

There is at least one scenario where this makes a difference: if you take
a base backup from a standby server right after a timeline switch, the
WAL segment containing the initial checkpoint record will begin with an
older timeline ID. Without the timeline history file, recovering that file
will fail as the older timeline ID is not recognized to be an ancestor of
the target timeline. If you try to recover from such a backup, using only
streaming replication to fetch the WAL, this patch is required for that to
work.
2013-01-03 10:41:58 +02:00
Alvaro Herrera 84f6fb81b8 Fix IsUnderPostmaster/EXEC_BACKEND confusion 2013-01-02 18:39:20 -03:00
Alvaro Herrera 15658911d9 Set MaxBackends only on bootstrap and standalone modes
... not on auxiliary processes.  I managed to overlook the fact that I
had disabled assertions on my HEAD checkout long ago.

Hopefully this will turn the buildfarm green again, and put an end to
today's silliness.
2013-01-02 17:57:56 -03:00
Magnus Hagander 794397ae1d Move tar function headers to pgtar.h
This makes it possible to include them only where they are used, so
we can avoid the conflict of the uid_t and gid_t datatypes that happened
in plperl (since plperl doesn't need the tar functions)
2013-01-02 20:34:08 +01:00
Alvaro Herrera dfbba2c86c Make sure MaxBackends is always set
Auxiliary and bootstrap processes weren't getting it, causing initdb to
fail completely.
2013-01-02 14:39:11 -03:00
Alvaro Herrera cdbc0ca48c Fix background workers for EXEC_BACKEND
Commit da07a1e8 was broken for EXEC_BACKEND because I failed to realize
that the MaxBackends recomputation needed to be duplicated by
subprocesses in SubPostmasterMain.  However, instead of having the value
be recomputed at all, it's better to assign the correct value at
postmaster initialization time, and have it be propagated to exec'ed
backends via BackendParameters.

MaxBackends stays as zero until after modules in
shared_preload_libraries have had a chance to register bgworkers, since
the value is going to be untrustworthy till that's finished.

Heikki Linnakangas and Álvaro Herrera
2013-01-02 12:01:14 -03:00
Heikki Linnakangas d194d7a526 Fix bug in streaming replication over multiple tli switches.
After receiving some WAL over streaming replication, try to open the file
from the timeline we're currently recieving, not recoveryTargetTLI. They
are usually the same, which is why wasn't noticed before, but you'd get
an error if there have been more than one timeline switch between the
current point in WAL and the recovery target.
2013-01-02 14:35:15 +02:00
Heikki Linnakangas 4ffd589f44 Fix silly typo in code, which broke the check for reaching consistency. 2013-01-02 13:44:59 +02:00
Bruce Momjian bd61a623ac Update copyrights for 2013
Fully update git head, and update back branches in ./COPYRIGHT and
legal.sgml files.
2013-01-01 17:15:01 -05:00
Magnus Hagander b10e9ae9f3 Add new file to MSVC build system as well 2013-01-01 18:29:48 +01:00
Magnus Hagander f5d4bdd3a5 Unify some tar functionality across different parts
Move some of the tar functionality that existed mostly duplicated
in both pg_dump and the walsender basebackup functionality into
port/tar.c instead, so it can be used from both. It will also be
used by pg_basebackup in the future, which would've caused a third
copy of it around.

Zoltan Boszormenyi and Magnus Hagander
2013-01-01 18:15:57 +01:00
Tom Lane 2ffa740be9 Fix ruleutils to cope with conflicts from adding/dropping/renaming columns.
In commit 11e131854f, we improved the
rule/view dumping code so that it would produce valid query representations
even if some of the tables involved in a query had been renamed since the
query was parsed.  This patch extends that idea to fix problems that occur
when individual columns are renamed, or added or dropped.  As before, the
core of the fix is to assign unique new aliases when a name conflict has
been created.  This is complicated by the JOIN USING feature, which
requires the same column alias to be used in both input relations, but we
can handle that with a sufficiently complex approach to assigning aliases.

A fortiori, this patch takes care of situations where the query didn't have
unique column names to begin with, such as in a recent complaint from Bryan
Nuse.  (Because of expansion of "SELECT *", re-parsing a dumped query can
require column name uniqueness even though the original text did not.)
2012-12-31 15:13:26 -05:00
Peter Eisentraut 0e690209ee Fix compiler warning about uninitialized variable 2012-12-31 00:13:40 -05:00
Heikki Linnakangas 60df192aea Keep timeline history files restored from archive in pg_xlog.
The cascading standby patch in 9.2 changed the way WAL files are treated
when restored from the archive. Before, they were restored under a temporary
filename, and not kept in pg_xlog, but after the patch, they were copied
under pg_xlog. This is necessary for a cascading standby to find them, but
it also means that if the archive goes offline and a standby is restarted,
it can recover back to where it was using the files in pg_xlog. It also
means that if you take an offline backup from a standby server, it includes
all the required WAL files in pg_xlog.

However, the same change was not made to timeline history files, so if the
WAL segment containing the checkpoint record contains a timeline switch, you
will still get an error if you try to restart recovery without the archive,
or recover from an offline backup taken from the standby.

With this patch, timeline history files restored from archive are copied
into pg_xlog like WAL files are, so that pg_xlog contains all the files
required to recover. This is a corner-case pre-existing issue in 9.2, but
even more important in master where it's possible for a standby to follow a
timeline switch through streaming replication. To make that possible, the
timeline history files must be present in pg_xlog.
2012-12-30 14:29:45 +02:00
Robert Haas 82b1b213ca Adjust more backend functions to return OID rather than void.
This is again intended to support extensions to the event trigger
functionality.  This may go a bit further than we need for that
purpose, but there's some value in being consistent, and the OID
may be useful for other purposes also.

Dimitri Fontaine
2012-12-29 07:55:37 -05:00
Alvaro Herrera 5ab3af46dd Remove obsolete XLogRecPtr macros
This gets rid of XLByteLT, XLByteLE, XLByteEQ and XLByteAdvance.
These were useful for brevity when XLogRecPtrs were split in
xlogid/xrecoff; but now that they are simple uint64's, they are just
clutter.  The only downside to making this change would be ease of
backporting patches, but that has been negated by other substantive
changes to the involved code anyway.  The clarity of simpler expressions
makes the change worthwhile.

Most of the changes are mechanical, but in a couple of places, the patch
author chose to invert the operator sense, making the code flow more
logical (and more in line with preceding comments).

Author: Andres Freund
Eyeballed by Dimitri Fontaine and Alvaro Herrera
2012-12-28 13:06:15 -03:00
Alvaro Herrera 24eca7977e Assign InvalidXLogRecPtr instead of MemSet(0)
For consistency.

Author: Andres Freund
2012-12-27 18:33:03 -03:00
Alvaro Herrera eaa1f7220a Remove unused NextLogPage macro
Commit 061e7efb1b did away with its last caller, but neglected to remove
the actual definition.

Author: Andres Freund
2012-12-27 18:23:23 -03:00
Tom Lane 3f88b08003 Fix some minor issues in view pretty-printing.
Code review for commit 2f582f76b1945929ff07116cd4639747ce9bb8a1: don't use
a static variable for what ought to be a deparse_context field, fix
non-multibyte-safe test for spaces, avoid useless and potentially O(N^2)
(though admittedly with a very small constant) calculations of wrap
positions when we aren't going to wrap.
2012-12-24 17:52:19 -05:00
Simon Riggs c2b3218064 Update comments on rd_newRelfilenodeSubid.
Ensure comments accurately reflect state of code
given new understanding, and recent changes.
Include example code from Noah Misch to
illustrate how rd_newRelfilenodeSubid can be
reset deterministically. No code changes.
2012-12-24 17:07:06 +00:00
Simon Riggs ae9aba69a8 Keep rd_newRelfilenodeSubid across overflow.
Teach RelationCacheInvalidate() to keep rd_newRelfilenodeSubid across rel cache
message overflows, so that behaviour is now fully deterministic.

Noah Misch
2012-12-24 16:43:22 +00:00
Simon Riggs 42fa810c14 Fix more weird compiler messages caused
by unmatched function prototypes.

Andres Freund
2012-12-24 16:25:26 +00:00
Simon Riggs 2dcb2ebee2 Add function prototype from previous commit. 2012-12-24 09:18:42 +00:00
Robert Haas c504513f83 Adjust many backend functions to return OID rather than void.
Extracted from a larger patch by Dimitri Fontaine.  It is hoped that
this will provide infrastructure for enriching the new event trigger
functionality, but it seems possibly useful for other purposes as
well.
2012-12-23 18:37:58 -05:00
Tom Lane 31bc839724 Prevent failure when RowExpr or XmlExpr is parse-analyzed twice.
transformExpr() is required to cope with already-transformed expression
trees, for various ugly-but-not-quite-worth-cleaning-up reasons.  However,
some of its newer subroutines hadn't gotten the memo.  This accounts for
bug #7763 from Norbert Buchmuller: transformRowExpr() was overwriting the
previously determined type of a RowExpr during CREATE TABLE LIKE INCLUDING
INDEXES.  Additional investigation showed that transformXmlExpr had the
same kind of problem, but all the other cases seem to be safe.

Andres Freund and Tom Lane
2012-12-23 14:07:24 -05:00
Heikki Linnakangas 1ff92eea14 Fix sloppiness in the timeline switch over streaming replication patch.
Here's another attempt at fixing the logic that decides how far the WAL can
be streamed, which was still broken if the timeline changed while streaming.
You would get an assertion failure. The way the logic is now written is more
readable, too.

Thom Brown reported the assertion failure.
2012-12-21 20:08:12 +02:00
Heikki Linnakangas 36e4456d78 Fix race condition if a file is removed while pg_basebackup is running.
If a relation file was removed when the server-side counterpart of
pg_basebackup was just about to open it to send it to the client, you'd
get a "could not open file" error. Fix that.

Backpatch to 9.1, this goes back to when pg_basebackup was introduced.
2012-12-21 15:34:15 +02:00
Heikki Linnakangas d57a97343e Forgot to remove extern declaration of GetRecoveryTargetTLI()
Fujii Masao
2012-12-21 09:29:03 +02:00
Peter Eisentraut 740ee42da5 Make some messages more consistent in style 2012-12-21 00:10:46 -05:00
Peter Eisentraut a0bfb7b36e Fix grammatical mistake in error message 2012-12-20 23:36:13 -05:00
Tom Lane 343c2a865b Fix pg_extension_config_dump() to handle update cases more sanely.
If pg_extension_config_dump() is executed again for a table already listed
in the extension's extconfig, the code was blindly making a new array entry.
This does not seem useful.  Fix it to replace the existing array entry
instead, so that it's possible for extension update scripts to alter the
filter conditions for configuration tables.

In addition, teach ALTER EXTENSION DROP TABLE to check for an extconfig
entry for the target table, and remove it if present.  This is not a 100%
solution because it's allowed for an extension update script to just
summarily DROP a member table, and that code path doesn't go through
ExecAlterExtensionContentsStmt.  We could probably make that case clean
things up if we had to, but it would involve sticking a very ugly wart
somewhere in the guts of dependency.c.  Since on the whole it seems quite
unlikely that extension updates would want to remove pre-existing
configuration tables, making the case possible with an explicit command
seems sufficient.

Per bug #7756 from Regina Obe.  Back-patch to 9.1 where extensions were
introduced.
2012-12-20 16:31:42 -05:00
Heikki Linnakangas 343ee00b73 Fix recycling of WAL segments after switching timeline during recovery.
This was broken before, we would recycle old WAL segments on wrong timeline
after the recovery target timeline had changed, but my recent commit to
not initialize ThisTimeLineID at all in a standby's checkpointer process
broke this completely.

The problem is that when installing a recycled WAL segment as a future one,
ThisTimeLineID is used to construct the filename. To fix, always update
ThisTimeLineID to the current timeline being recovered, before recycling
WAL segments at a restartpoint.

This still leaves a small window where we might install WAL segments under
wrong timeline ID, if the timeline is changed just as we're about to start
recycling. Also, even if we're replaying timeline X at the momnent, there's
no guarantee that we'll need as many WAL segments on that timeline as we
recycle. We might be just about to reach the point where we switch to next
timeline, so might only need one more WAL segment on the current timeline.
We'll live with the waste in that situation.

Bug pointed out by Fujii Masao. 9.1 and 9.2 had the same issue, when
recovery target timeline was changed, but I committed a slightly different
version of this patch on those branches.
2012-12-20 22:00:58 +02:00
Bruce Momjian dc9896a245 Avoid using NAMEDATALEN in pg_upgrade
Because the client encoding might not match the server encoding,
pg_upgrade can't allocate NAMEDATALEN bytes for storage of database,
relation, and namespace identifiers.  Instead pg_strdup() the memory and
free it.

Also add C comment in initdb.c about safe NAMEDATALEN usage.
2012-12-20 13:56:31 -05:00
Heikki Linnakangas af275a12df Follow TLI of last replayed record, not recovery target TLI, in walsenders.
Most of the time, the last replayed record comes from the recovery target
timeline, but there is a corner case where it makes a difference. When
the startup process scans for a new timeline, and decides to change recovery
target timeline, there is a window where the recovery target TLI has already
been bumped, but there are no WAL segments from the new timeline in pg_xlog
yet. For example, if we have just replayed up to point 0/30002D8, on
timeline 1, there is a WAL file called 000000010000000000000003 in pg_xlog
that contains the WAL up to that point. When recovery switches recovery
target timeline to 2, a walsender can immediately try to read WAL from
0/30002D8, from timeline 2, so it will try to open WAL file
000000020000000000000003. However, that doesn't exist yet - the startup
process hasn't copied that file from the archive yet nor has the walreceiver
streamed it yet, so walsender fails with error "requested WAL segment
000000020000000000000003 has already been removed". That's harmless, in that
the standby will try to reconnect later and by that time the segment is
already created, but error messages that should be ignored are not good.

To fix that, have walsender track the TLI of the last replayed record,
instead of the recovery target timeline. That way walsender will not try to
read anything from timeline 2, until the WAL segment has been created and at
least one record has been replayed from it. The recovery target timeline is
now xlog.c's internal affair, it doesn't need to be exposed in shared memory
anymore.

This fixes the error reported by Thom Brown. depesz the same error message,
but I'm not sure if this fixes his scenario.
2012-12-20 14:39:04 +02:00
Heikki Linnakangas 1a11d4609e Don't set ThisTimeLineID in checkpointer & bgwriter during recovery.
We used to set it to the current recovery target timeline, but the recovery
target timeline can change during recovery, leaving ThisTimeLineID at an
old value. That seems worse than always leaving it at zero to begin with.

AFAICS there was no good reason to set it in the first place. ThisTimeLineID
is not needed in checkpointer or bgwriter process, until it's time to write
the end-of-recovery checkpoint, and at that point ThisTimeLineID is updated
anyway.
2012-12-20 14:39:04 +02:00
Heikki Linnakangas e43f947bf3 Check if we've reached end-of-backup point also if no redo is required.
If you restored from a backup taken from a standby, and the last record in
the backup is the checkpoint record, ie. there is no redo required except
for the checkpoint record, we would fail to notice that we've reached the
end-of-backup point, and the database is consistent. The result was an
error "WAL ends before end of online backup". To fix, move the
have-we-reached-end-of-backup check into CheckRecoveryConsistency(), which
is already responsible for similar checks with minRecoveryPoint, and is
called in the right places.

Backpatch to 9.2, this check and bug did not exist before that.
2012-12-19 14:22:00 +02:00
Peter Eisentraut f2b88080db Rename SQL feature S403 to ARRAY_MAX_CARDINALITY
In an earlier version of the standard, this was called just
"MAX_CARDINALITY".
2012-12-19 07:14:27 -05:00
Peter Eisentraut 6925e38dad pg_basebackup: Small message punctuation improvements 2012-12-19 07:01:11 -05:00
Andrew Dunstan 9ac749ceb5 Don't include postgres.h in postgres_fe.h for cpluspluscheck.
Error exposed by recent Assert changes.

Complaint from Peter Eisentraut.
2012-12-18 16:30:14 -05:00
Peter Eisentraut 1a5f04dd7e Remove allow_nonpic_in_shlib
This was used in a time when a shared libperl or libpython was difficult
to come by.  That is obsolete, and the idea behind the flag was never
fully portable anyway and will likely fail on more modern CPU
architectures.
2012-12-18 01:13:59 -05:00
Tom Lane 6919b7e329 Fix failure to ignore leftover temp tables after a server crash.
During crash recovery, we remove disk files belonging to temporary tables,
but the system catalog entries for such tables are intentionally not
cleaned up right away.  Instead, the first backend that uses a temp schema
is expected to clean out any leftover objects therein.  This approach
requires that we be careful to ignore leftover temp tables (since any
actual access attempt would fail), *even if their BackendId matches our
session*, if we have not yet established use of the session's corresponding
temp schema.  That worked fine in the past, but was broken by commit
debcec7dc3 which incorrectly removed the
rd_islocaltemp relcache flag.  Put it back, and undo various changes
that substituted tests like "rel->rd_backend == MyBackendId" for use
of a state-aware flag.  Per trouble report from Heikki Linnakangas.

Back-patch to 9.1 where the erroneous change was made.  In the back
branches, be careful to add rd_islocaltemp in a spot in the struct that
was alignment padding before, so as not to break existing add-on code.
2012-12-17 20:15:32 -05:00
Tom Lane c299477229 Fix filling of postmaster.pid in bootstrap/standalone mode.
We failed to ever fill the sixth line (LISTEN_ADDR), which caused the
attempt to fill the seventh line (SHMEM_KEY) to fail, so that the shared
memory key never got added to the file in standalone mode.  This has been
broken since we added more content to our lock files in 9.1.

To fix, tweak the logic in CreateLockFile to add an empty LISTEN_ADDR
line in standalone mode.  This is a tad grotty, but since that function
already knows almost everything there is to know about the contents of
lock files, it doesn't seem that it's any better to hack it elsewhere.

It's not clear how significant this bug really is, since a standalone
backend should never have any children and thus it seems not critical
to be able to check the nattch count of the shmem segment externally.
But I'm going to back-patch the fix anyway.

This problem had escaped notice because of an ancient (and in hindsight
pretty dubious) decision to suppress LOG-level messages by default in
standalone mode; so that the elog(LOG) complaint in AddToDataDirLockFile
that should have warned of the problem didn't do anything.  Fixing that
is material for a separate patch though.
2012-12-16 15:02:49 -05:00
Andrew Dunstan 3717f0837b Tidy up from frontend Assert change.
Quiet compiler warnings noted by Peter Eisentraut.
2012-12-16 12:22:57 -05:00
Magnus Hagander c1f856a17f Properly copy fmgroids.h after clean on Win32
Craig Ringer
2012-12-16 14:56:51 +01:00
Andrew Dunstan 1c382655ad Provide Assert() for frontend code.
Per discussion on-hackers. psql is converted to use the new code.

Follows a suggestion from Heikki Linnakangas.
2012-12-14 18:03:07 -05:00
Robert Haas 75758a6ff0 Update comment in heapgetpage() regarding PD_ALL_VISIBLE vs. Hot Standby.
Pavan Deolasee, slightly modified by me
2012-12-14 15:44:38 -05:00
Peter Eisentraut fdb67eb2b6 NLS: Use msgmerge --previous option
It provides some additional help to translators.
2012-12-13 23:12:12 -05:00
Heikki Linnakangas abfd192b1b Allow a streaming replication standby to follow a timeline switch.
Before this patch, streaming replication would refuse to start replicating
if the timeline in the primary doesn't exactly match the standby. The
situation where it doesn't match is when you have a master, and two
standbys, and you promote one of the standbys to become new master.
Promoting bumps up the timeline ID, and after that bump, the other standby
would refuse to continue.

There's significantly more timeline related logic in streaming replication
now. First of all, when a standby connects to primary, it will ask the
primary for any timeline history files that are missing from the standby.
The missing files are sent using a new replication command TIMELINE_HISTORY,
and stored in standby's pg_xlog directory. Using the timeline history files,
the standby can follow the latest timeline present in the primary
(recovery_target_timeline='latest'), just as it can follow new timelines
appearing in an archive directory.

START_REPLICATION now takes a TIMELINE parameter, to specify exactly which
timeline to stream WAL from. This allows the standby to request the primary
to send over WAL that precedes the promotion. The replication protocol is
changed slightly (in a backwards-compatible way although there's little hope
of streaming replication working across major versions anyway), to allow
replication to stop when the end of timeline reached, putting the walsender
back into accepting a replication command.

Many thanks to Amit Kapila for testing and reviewing various versions of
this patch.
2012-12-13 19:17:32 +02:00
Heikki Linnakangas 527668717a Make xlog_internal.h includable in frontend context.
This makes unnecessary the ugly hack used to #include postgres.h in
pg_basebackup.

Based on Alvaro Herrera's patch
2012-12-13 14:59:13 +02:00
Heikki Linnakangas 6264cd3d69 In multi-insert, don't go into infinite loop on a huge tuple and fillfactor.
If a tuple is larger than page size minus space reserved for fillfactor,
heap_multi_insert would never find a page that it fits in and repeatedly ask
for a new page from RelationGetBufferForTuple. If a tuple is too large to
fit on any page, taking fillfactor into account, RelationGetBufferForTuple
will always expand the relation. In a normal insert, heap_insert will accept
that and put the tuple on the new page. heap_multi_insert, however, does a
fillfactor check of its own, and doesn't accept the newly-extended page
RelationGetBufferForTuple returns, even though there is no other choice to
make the tuple fit.

Fix that by making the logic in heap_multi_insert more like the heap_insert
logic. The first tuple is always put on the page RelationGetBufferForTuple
gives us, and the fillfactor check is only applied to the subsequent tuples.

Report from David Gould, although I didn't use his patch.
2012-12-12 13:54:42 +02:00
Tom Lane 691c5ebf79 Add defenses against integer overflow in dynahash numbuckets calculations.
The dynahash code requires the number of buckets in a hash table to fit
in an int; but since we calculate the desired hash table size dynamically,
there are various scenarios where we might calculate too large a value.
The resulting overflow can lead to infinite loops, division-by-zero
crashes, etc.  I (tgl) had previously installed some defenses against that
in commit 299d171652, but that covered only one
call path.  Moreover it worked by limiting the request size to work_mem,
but in a 64-bit machine it's possible to set work_mem high enough that the
problem appears anyway.  So let's fix the problem at the root by installing
limits in the dynahash.c functions themselves.

Trouble report and patch by Jeff Davis.
2012-12-11 22:09:05 -05:00
Tom Lane cd3413ec36 Disable event triggers in standalone mode.
Per discussion, this seems necessary to allow recovery from broken event
triggers, or broken indexes on pg_event_trigger.

Dimitri Fontaine
2012-12-11 19:28:31 -05:00
Kevin Grittner b19e4250b4 Fix performance problems with autovacuum truncation in busy workloads.
In situations where there are over 8MB of empty pages at the end of
a table, the truncation work for trailing empty pages takes longer
than deadlock_timeout, and there is frequent access to the table by
processes other than autovacuum, there was a problem with the
autovacuum worker process being canceled by the deadlock checking
code. The truncation work done by autovacuum up that point was
lost, and the attempt tried again by a later autovacuum worker. The
attempts could continue indefinitely without making progress,
consuming resources and blocking other processes for up to
deadlock_timeout each time.

This patch has the autovacuum worker checking whether it is
blocking any other thread at 20ms intervals. If such a condition
develops, the autovacuum worker will persist the work it has done
so far, release its lock on the table, and sleep in 50ms intervals
for up to 5 seconds, hoping to be able to re-acquire the lock and
try again. If it is unable to get the lock in that time, it moves
on and a worker will try to continue later from the point this one
left off.

While this patch doesn't change the rules about when and what to
truncate, it does cause the truncation to occur sooner, with less
blocking, and with the consumption of fewer resources when there is
contention for the table's lock.

The only user-visible change other than improved performance is
that the table size during truncation may change incrementally
instead of just once.

This problem exists in all supported versions but is infrequently
reported, although some reports of performance problems when
autovacuum runs might be caused by this. Initial commit is just the
master branch, but this should probably be backpatched once the
build farm and general developer usage confirm that there are no
surprising effects.

Jan Wieck
2012-12-11 14:33:08 -06:00
Heikki Linnakangas 970fb12de1 Consistency check should compare last record replayed, not last record read.
EndRecPtr is the last record that we've read, but not necessarily yet
replayed. CheckRecoveryConsistency should compare minRecoveryPoint with the
last replayed record instead. This caused recovery to think it's reached
consistency too early.

Now that we do the check in CheckRecoveryConsistency correctly, we have to
move the call of that function to after redoing a record. The current place,
after reading a record but before replaying it, is wrong. In particular, if
there are no more records after the one ending at minRecoveryPoint, we don't
enter hot standby until one extra record is generated and read by the
standby, and CheckRecoveryConsistency is called. These two bugs conspired
to make the code appear to work correctly, except for the small window
between reading the last record that reaches minRecoveryPoint, and
replaying it.

In the passing, rename recoveryLastRecPtr, which is the last record
replayed, to lastReplayedEndRecPtr. This makes it slightly less confusing
with replayEndRecPtr, which is the last record read that we're about to
replay.

Original report from Kyotaro HORIGUCHI, further diagnosis by Fujii Masao.
Backpatch to 9.0, where Hot Standby subtly changed the test from
"minRecoveryPoint < EndRecPtr" to "minRecoveryPoint <= EndRecPtr". The
former works because where the test is performed, we have always read one
more record than we've replayed.
2012-12-11 18:54:02 +02:00
Andrew Dunstan ad69bd052f Add mode where contrib installcheck runs each module in a separately named database.
Normally each module is tested in a database named contrib_regression,
which is dropped and recreated at the beginhning of each pg_regress run.
This new mode, enabled by adding USE_MODULE_DB=1 to the make command
line, runs most modules in a database with the module name embedded in
it.

This will make testing pg_upgrade on clusters with the contrib modules
a lot easier.

Second attempt at this, this time accomodating make versions older
than 3.82.

Still to be done: adapt to the MSVC build system.

Backpatch to 9.0, which is the earliest version it is reasonably
possible to test upgrading from.
2012-12-11 11:52:45 -05:00
Heikki Linnakangas 7bffc9b7bf Update minimum recovery point on truncation.
If a file is truncated, we must update minRecoveryPoint. Once a file is
truncated, there's no going back; it would not be safe to stop recovery
at a point earlier than that anymore.

Per report from Kyotaro HORIGUCHI. Backpatch to 8.4. Before that,
minRecoveryPoint was not updated during recovery at all.
2012-12-10 16:57:16 +02:00
Heikki Linnakangas 6be799664a Fix the tracking of min recovery point timeline.
Forgot to update it at the right place. Also, consider checkpoint record
that switches to new timelne to be on the new timeline.

This fixes erroneous "requested timeline 2 does not contain minimum recovery
point" errors, pointed out by Amit Kapila while testing another patch.
2012-12-10 16:04:26 +02:00
Tom Lane b46c92112b Fix assorted bugs in privileges-for-types patch.
Commit 729205571e added privileges on data
types, but there were a number of oversights.  The implementation of
default privileges for types missed a few places, and pg_dump was
utterly innocent of the whole concept.  Per bug #7741 from Nathan Alden,
and subsequent wider investigation.
2012-12-09 00:08:23 -05:00
Tom Lane a99c42f291 Support automatically-updatable views.
This patch makes "simple" views automatically updatable, without the need
to create either INSTEAD OF triggers or INSTEAD rules.  "Simple" views
are those classified as updatable according to SQL-92 rules.  The rewriter
transforms INSERT/UPDATE/DELETE commands on such views directly into an
equivalent command on the underlying table, which will generally have
noticeably better performance than is possible with either triggers or
user-written rules.  A view that has INSTEAD OF triggers or INSTEAD rules
continues to operate the same as before.

For the moment, security_barrier views are not considered simple.
Also, we do not support WITH CHECK OPTION.  These features may be
added in future.

Dean Rasheed, reviewed by Amit Kapila
2012-12-08 18:26:21 -05:00
Simon Riggs ef754fb51b Correct xmax test for COPY FREEZE 2012-12-07 14:18:47 +00:00
Simon Riggs 1f023f9297 Optimize COPY FREEZE with CREATE TABLE also.
Jeff Davis, additional test by me
2012-12-07 13:26:52 +00:00
Simon Riggs 1eb6cee499 Clarify that COPY FREEZE is not a hard rule.
Remove message when FREEZE not honoured,
clarify reasons in comments and docs.
2012-12-07 12:59:05 +00:00
Tom Lane 31a891857a Improve pl/pgsql to support composite-type expressions in RETURN.
For some reason lost in the mists of prehistory, RETURN was only coded to
allow a simple reference to a composite variable when the function's return
type is composite.  Allow an expression instead, while preserving the
efficiency of the original code path in the case where the expression is
indeed just a composite variable's name.  Likewise for RETURN NEXT.

As is true in various other places, the supplied expression must yield
exactly the number and data types of the required columns.  There was some
discussion of relaxing that for pl/pgsql, but no consensus yet, so this
patch doesn't address that.

Asif Rehman, reviewed by Pavel Stehule
2012-12-06 23:09:52 -05:00
Alvaro Herrera da07a1e856 Background worker processes
Background workers are postmaster subprocesses that run arbitrary
user-specified code.  They can request shared memory access as well as
backend database connections; or they can just use plain libpq frontend
database connections.

Modules listed in shared_preload_libraries can register background
workers in their _PG_init() function; this is early enough that it's not
necessary to provide an extra GUC option, because the necessary extra
resources can be allocated early on.  Modules can install more than one
bgworker, if necessary.

Care is taken that these extra processes do not interfere with other
postmaster tasks: only one such process is started on each ServerLoop
iteration.  This means a large number of them could be waiting to be
started up and postmaster is still able to quickly service external
connection requests.  Also, shutdown sequence should not be impacted by
a worker process that's reasonably well behaved (i.e. promptly responds
to termination signals.)

The current implementation lets worker processes specify their start
time, i.e. at what point in the server startup process they are to be
started: right after postmaster start (in which case they mustn't ask
for shared memory access), when consistent state has been reached
(useful during recovery in a HOT standby server), or when recovery has
terminated (i.e. when normal backends are allowed).

In case of a bgworker crash, actions to take depend on registration
data: if shared memory was requested, then all other connections are
taken down (as well as other bgworkers), just like it were a regular
backend crashing.  The bgworker itself is restarted, too, within a
configurable timeframe (which can be configured to be never).

More features to add to this framework can be imagined without much
effort, and have been discussed, but this seems good enough as a useful
unit already.

An elementary sample module is supplied.

Author: Álvaro Herrera

This patch is loosely based on prior patches submitted by KaiGai Kohei,
and unsubmitted code by Simon Riggs.

Reviewed by: KaiGai Kohei, Markus Wanner, Andres Freund,
Heikki Linnakangas, Simon Riggs, Amit Kapila
2012-12-06 17:47:30 -03:00
Tom Lane e31d524867 Fix intermittent crash in DROP INDEX CONCURRENTLY.
When deleteOneObject closes and reopens the pg_depend relation,
we must see to it that the relcache pointer held by the calling function
(typically performMultipleDeletions) is updated.  Usually the relcache
entry is retained so that the pointer value doesn't change, which is why
the problem had escaped notice ... but after a cache flush event there's
no guarantee that the same memory will be reassigned.  To fix, change
the recursive functions' APIs so that we pass around a "Relation *"
not just "Relation".

Per investigation of occasional buildfarm failures.  This is trivial
to reproduce with -DCLOBBER_CACHE_ALWAYS, which points up the sad
lack of any buildfarm member running that way on a regular basis.
2012-12-05 23:42:51 -05:00
Alvaro Herrera 5e15cdb2ae Update comment at top of index_create
I neglected to update it in commit f4c4335.

Michael Paquier
2012-12-05 23:09:46 -03:00
Tom Lane af4aba2f05 Ensure recovery pause feature doesn't pause unless users can connect.
If we're not in hot standby mode, then there's no way for users to connect
to reset the recoveryPause flag, so we shouldn't pause.  The code was aware
of this but the test to see if pausing was safe was seriously inadequate:
it wasn't paying attention to reachedConsistency, and besides what it was
testing was that we could legally enter hot standby, not that we have
done so.  Get rid of that in favor of checking LocalHotStandbyActive,
which because of the coding in CheckRecoveryConsistency is tantamount to
checking that we have told the postmaster to enter hot standby.

Also, move the recoveryPausesHere() call that reacts to asynchronous
recoveryPause requests so that it's not in the middle of application of a
WAL record.  I put it next to the recoveryStopsHere() call --- in future
those are going to need to interact significantly, so this seems like a
good waystation.

Also, don't bother trying to read another WAL record if we've already
decided not to continue recovery.  This was no big deal when the code was
written originally, but now that reading a record might entail actions like
fetching an archive file, it seems a bit silly to do it like that.

Per report from Jeff Janes and subsequent discussion.  The pause feature
needs quite a lot more work, but this gets rid of some indisputable bugs,
and seems safe enough to back-patch.
2012-12-05 18:27:50 -05:00
Heikki Linnakangas d67b06fe3e Oops, meant to change the comment in writeTimeLineHistory. 2012-12-05 21:00:59 +02:00
Simon Riggs 6aa2e49a87 Must not reach consistency before XLOG_BACKUP_RECORD
When waiting for an XLOG_BACKUP_RECORD the minRecoveryPoint
will be incorrect, so we must not declare recovery as consistent
before we have seen the record. Major bug allowing recovery to end
too early in some cases, allowing people to see inconsistent db.
This patch to HEAD and 9.2, other fix required for 9.1 and 9.0

Simon Riggs and Andres Freund, bug report by Jeff Janes
2012-12-05 13:28:03 +00:00
Tom Lane cdf498c5d7 Attempt to un-break Windows builds with USE_LDAP.
The buildfarm shows this case is entirely broken, and I'm betting the
reason is lack of any include file.
2012-12-04 17:25:51 -05:00
Michael Meskes ac99ca68d7 Include isinf.o in libecpg if isinf() is not available on the system.
Patch done by Jiang Guiqing <jianggq@cn.fujitsu.com>.
2012-12-04 16:44:22 +01:00
Heikki Linnakangas 90991c40eb Downgrade a status message from LOG to DEBUG2.
I never intended this to be anything other than a debugging aid, but forgot
to change the level before committing.
2012-12-04 17:29:44 +02:00
Heikki Linnakangas 32f4de0adf Write exact xlog position of timeline switch in the timeline history file.
This allows us to do some more rigorous sanity checking for various
incorrect point-in-time recovery scenarios, and provides more information
for debugging purposes. It will also come handy in the upcoming patch to
allow timeline switches to be replicated by streaming replication.
2012-12-04 17:29:07 +02:00
Bruce Momjian a84c30dda5 In initdb.c, move auth warning code into main() from secondary function. 2012-12-04 09:52:00 -05:00
Peter Eisentraut ec8d1e32dd Fix build of LDAP URL feature
Some code was not ifdef'ed out for non-LDAP builds.

patch from Bruce Momjian
2012-12-04 06:42:25 -05:00
Heikki Linnakangas 5ce108bf32 Track the timeline associated with minRecoveryPoint, for more sanity checks.
This allows recovery to notice certain incorrect recovery scenarios.
If a server has recovered to point X on timeline 5, and you restart
recovery, it better be on timeline 5 when it reaches point X again, not on
some timeline with a higher ID. This can happen e.g if you a standby server
is shut down, a new timeline appears in the WAL archive, and the standby
server is restarted. It will try to follow the new timeline, which is wrong
because some WAL on the old timeline was already replayed before shutdown.

Requires an initdb (or at least pg_resetxlog), because this adds a field to
the control file.
2012-12-04 11:31:00 +02:00
Peter Eisentraut aa2fec0a18 Add support for LDAP URLs
Allow specifying LDAP authentication parameters as RFC 4516 LDAP URLs.
2012-12-03 23:31:02 -05:00
Bruce Momjian 26374f2a0f In initdb.c, rename some newly created functions, and move the directory
creation and xlog symlink creation to separate functions.

Per suggestions from Andrew Dunstan.
2012-12-03 23:22:56 -05:00
Bruce Momjian 630cd14426 Add initdb --sync-only option to sync the data directory to durable
storage.

Have pg_upgrade use it, and enable server options fsync=off and
full_page_writes=off.

Document that users turning fsync from off to on should run initdb
--sync-only.

[ Previous commit was incorrectly applied as a git merge. ]
2012-12-03 22:47:59 -05:00
Bruce Momjian 25d1ed04a2 Revert initdb --sync-only patch that had incorrect commit messages. 2012-12-03 22:46:51 -05:00
Bruce Momjian cd7569a546 dummy commit 2012-12-03 22:45:02 -05:00
Bruce Momjian db00d837c1 In pg_upgrade, fix bug where no users were dumped in pg_dumpall
binary-upgrade mode;  instead only skip dumping the current user.

This bug was introduced in during the removal of split_old_dump().  Bug
discovered during local testing.
2012-12-03 19:43:02 -05:00
Andrew Dunstan fc5c1bbbeb Revert "Add mode where contrib installcheck runs each module in a separately named database."
This reverts commit e2b3c21b05.
2012-12-03 15:00:51 -05:00
Simon Riggs 62656617db Avoid holding vmbuffer pin after VACUUM.
During VACUUM if we pause to perform a cycle
of index cleanup we drop the vmbuffer pin,
so we should do the same thing when heap
scan completes. This avoids holding vmbuffer
pin across the main index cleanup in VACUUM,
which could be minutes or hours longer than
necessary for correctness.

Bug report and suggested fix from Pavan Deolasee
2012-12-03 18:53:31 +00:00
Andrew Dunstan d5652e50d5 Attempt to unbreak MSVC builds broken by f21bb9cfb5.
We can't use type uint, so use uint32.
2012-12-03 10:23:22 -05:00
Simon Riggs f21bb9cfb5 Refactor inCommit flag into generic delayChkpt flag.
Rename PGXACT->inCommit flag into delayChkpt flag,
and generalise comments to allow use in other situations,
such as the forthcoming potential use in checksum patch.
Replace wait loop to look for VXIDs with delayChkpt set.
No user visible changes, not behaviour changes at present.

Simon Riggs, reviewed and rebased by Jeff Davis
2012-12-03 13:13:53 +00:00
Simon Riggs 7a764990d8 Clarify locking for PageGetLSN() in XLogCheckBuffer() 2012-12-03 12:20:31 +00:00
Simon Riggs 1c563a2ae1 Clarify when to use PageSetLSN/PageGetLSN().
Update README to explain prerequisites for
correct access to LSN fields of a page.
Independent chunk removed from checksums
patch to reduce size of patch.
2012-12-03 11:59:25 +00:00
Heikki Linnakangas a068c391ab Refactor the code implementing standby-mode logic.
It is now easier to see that it's a state machine, making the code easier
to understand overall.
2012-12-03 12:32:44 +02:00
Andrew Dunstan e2b3c21b05 Add mode where contrib installcheck runs each module in a separately named database.
Normally each module is tested in aq database named contrib_regression,
which is dropped and recreated at the beginhning of each pg_regress run.
This mode, enabled by adding USE_MODULE_DB=1 to the make command line,
runs most modules in a database with the module name embedded in it.

This will make testing pg_upgrade on clusters with the contrib modules
a lot easier.

Still to be done: adapt to the MSVC build system.

Backpatch to 9.0, which is the earliest version it is reasonably possible
to test upgrading from.
2012-12-02 17:20:38 -05:00
Tom Lane fc75d4f81c Update time zone data files to tzdata release 2012j.
DST law changes in Cuba, Israel, Jordan, Libya, Palestine, Western Samoa,
and portions of Brazil.
2012-12-02 16:35:23 -05:00
Simon Riggs 5457a130d3 Reduce scope of changes for COPY FREEZE.
Allow support only for freezing tuples by explicit
command. Previous coding mistakenly extended
slightly beyond what was agreed as correct on -hackers.
So essentially a partial revoke of earlier work,
leaving just the COPY FREEZE command.
2012-12-02 20:52:52 +00:00
Tom Lane 3114cb60a1 Don't advance checkPoint.nextXid near the end of a checkpoint sequence.
This reverts commit c11130690d in favor of
actually fixing the problem: namely, that we should never have been
modifying the checkpoint record's nextXid at this point to begin with.
The nextXid should match the state as of the checkpoint's logical WAL
position (ie the redo point), not the state as of its physical position.
It's especially bogus to advance it in some wal_levels and not others.
In any case there is no need for the checkpoint record to carry the
same nextXid shown in the XLOG_RUNNING_XACTS record just emitted by
LogStandbySnapshot, as any replay operation will already have adopted
that value as current.

This fixes bug #7710 from Tarvi Pillessaar, and probably also explains bug
#6291 from Daniel Farina, in that if a checkpoint were in progress at the
instant of XID wraparound, the epoch bump would be lost as reported.
(And, of course, these days there's at least a 50-50 chance of a checkpoint
being in progress at any given instant.)

Diagnosed by me and independently by Andres Freund.  Back-patch to all
branches supporting hot standby.
2012-12-02 15:20:41 -05:00
Simon Riggs 5c11725867 Rearrange storage of data in xl_running_xacts.
Previously we stored all xids mixed together.
Now we store top-level xids first, followed
by all subxids. Also skip logging any subxids
if the snapshot is suboverflowed, since there
are potentially large numbers of them and they
are not useful in that case anyway. Has value
in the envisaged design for decoding of WAL.
No planned effect on Hot Standby.

Andres Freund, reviewed by me
2012-12-02 19:39:37 +00:00
Simon Riggs c11130690d XidEpoch++ if wraparound during checkpoint.
If wal_level = hot_standby we update the checkpoint nextxid,
though in the case where a wraparound occurred half-way through
a checkpoint we would neglect updating the epoch also. Updating
the nextxid is arguably the wrong thing to do, but changing that
may introduce subtle bugs into hot standby startup, while updating
the value doesn't cause any known bugs yet. Minimal fix now to
HEAD and backbranches, wider fix later in HEAD.

Bug reported in #6291 by Daniel Farina and slightly differently in

Cause analysis and recommended fixes from Tom Lane and Andres Freund.

Applied patch is minimal version of Andres Freund's work.
2012-12-02 14:57:44 +00:00
Simon Riggs 9f98704b82 Clarify operation of online checkpoints.
Previous comments left, but were too obscure
for such an important aspect of the system.
2012-12-02 13:09:55 +00:00
Tatsuo Ishii 53edb8dc02 Fix psql crash while parsing SQL file whose encoding is different from
client encoding and the client encoding is not *safe* one. Such an
example is, file encoding is UTF-8 and client encoding SJIS. Patch
contributed by Jiang Guiqing.
2012-12-02 21:11:15 +09:00
Tom Lane c35fea1026 Prevent passing gmake's environment variables down through pg_regress.
When we do "make install" to create a temp installation, we don't want
that instance of make to try to communicate with any instance of make
that might be calling us.  This is known to cause problems if the
upper make has a -jN flag, and in principle could cause problems even
without that.  Unset the relevant environment variables to prevent such
issues.

Andres Freund
2012-12-01 17:23:49 -05:00
Tom Lane b1346822f3 Make sure sharedir/extension/ directory is created when needed.
The previous coding worked as long as MODULEDIR wasn't set explicitly,
because we create sharedir/$(datamoduledir) and the default value of
that is "extension".  But if some other value is specified for MODULEDIR
then the installation directory needed for the control file wasn't made.

Cédric Villemain
2012-12-01 16:04:39 -05:00
Tom Lane 7b90469b71 Allow adding values to an enum type created in the current transaction.
Normally it is unsafe to allow ALTER TYPE ADD VALUE in a transaction block,
because instances of the value could be added to indexes later in the same
transaction, and then they would still be accessible even if the
transaction rolls back.  However, we can allow this if the enum type itself
was created in the current transaction, because then any such indexes would
have to go away entirely on rollback.

The reason for allowing this is to support pg_upgrade's new usage of
pg_restore --single-transaction: in --binary-upgrade mode, pg_dump emits
enum types as a succession of ALTER TYPE ADD VALUE commands so that it can
preserve the values' OIDs.  The support is a bit limited, so we'll leave
it undocumented.

Andres Freund
2012-12-01 14:27:30 -05:00
Simon Riggs 02aea36414 Second tweak of COPY FREEZE 2012-12-01 14:55:35 +00:00
Simon Riggs ddf509eb4a Tweak tests in COPY FREEZE 2012-12-01 13:46:41 +00:00
Simon Riggs 8de72b66a2 COPY FREEZE and mark committed on fresh tables.
When a relfilenode is created in this subtransaction or
a committed child transaction and it cannot otherwise
be seen by our own process, mark tuples committed ahead
of transaction commit for all COPY commands in same
transaction. If FREEZE specified on COPY
and pre-conditions met then rows will also be frozen.
Both options designed to avoid revisiting rows after commit,
increasing performance of subsequent commands after
data load and upgrade. pg_restore changes later.

Simon Riggs, review comments from Heikki Linnakangas, Noah Misch and design
input from Tom Lane, Robert Haas and Kevin Grittner
2012-12-01 12:54:20 +00:00
Alvaro Herrera 113d25c4e6 Change test ExceptionalCondition to return void
Commit 81107282a changed it in assert.c, but overlooked this other file.
2012-11-30 19:24:21 -03:00
Bruce Momjian b86327c1c5 Split initdb.c main() code into multiple functions, for easier
maintenance.
2012-11-30 16:45:08 -05:00
Bruce Momjian 12ee6ec71f In pg_upgrade, dump each database separately and use
--single-transaction to restore each database schema.  This yields
performance improvements for databases with many tables.  Also, remove
split_old_dump() as it is no longer needed.
2012-11-30 16:30:13 -05:00
Bruce Momjian bd9c8e741b Move long_options structures to the top of main() functions, for
consistency.

Per suggestion from Tom.
2012-11-30 14:49:55 -05:00
Tom Lane da63fec7db Add missing buffer lock acquisition in GetTupleForTrigger().
If we had not been holding buffer pin continuously since the tuple was
initially fetched by the UPDATE or DELETE query, it would be possible for
VACUUM or a page-prune operation to move the tuple while we're trying to
copy it.  This would result in a garbage "old" tuple value being passed to
an AFTER ROW UPDATE or AFTER ROW DELETE trigger.  The preconditions for
this are somewhat improbable, and the timing constraints are very tight;
so it's not so surprising that this hasn't been reported from the field,
even though the bug has been there a long time.

Problem found by Andres Freund.  Back-patch to all active branches.
2012-11-30 13:55:55 -05:00
Magnus Hagander 65c3bf19fd Add libpq function PQconninfo()
This allows a caller to get back the exact conninfo array that was
used to create a connection, including parameters read from the
environment.

In doing this, restructure how options are copied from the conninfo
to the actual connection.

Zoltan Boszormenyi and Magnus Hagander
2012-11-30 15:11:08 +09:00
Tom Lane 4af446e7cd Produce a more useful error message for over-length Unix socket paths.
The length of a socket path name is constrained by the size of struct
sockaddr_un, and there's not a lot we can do about it since that is a
kernel API.  However, it would be a good thing if we produced an
intelligible error message when the user specifies a socket path that's too
long --- and getaddrinfo's standard API is too impoverished to do this in
the natural way.  So insert explicit tests at the places where we construct
a socket path name.  Now you'll get an error that makes sense and even
tells you what the limit is, rather than something generic like
"Non-recoverable failure in name resolution".

Per trouble report from Jeremy Drake and a fix idea from Andrew Dunstan.
2012-11-29 19:57:01 -05:00
Simon Riggs d3fe59939c Correctly init fast path fields on PGPROC 2012-11-29 22:15:52 +00:00
Simon Riggs f1e57a4ec9 Cleanup VirtualXact at end of Hot Standby. 2012-11-29 21:59:11 +00:00
Robert Haas 7a2fe9bd03 Basic binary heap implementation.
There are probably other places where this can be used, but for now,
this just makes MergeAppend use it, so that this code will have test
coverage.  There is other work in the queue that will use this, as
well.

Abhijit Menon-Sen, reviewed by Andres Freund, Robert Haas, Álvaro
Herrera, Tom Lane, and others.
2012-11-29 11:16:59 -05:00
Michael Meskes 086cf1458c When processing nested structure pointer variables ecpg always expected an
array datatype which of course is wrong.

Applied patch by Muhammad Usama <m.usama@gmail.com> to fix this.
2012-11-29 17:12:00 +01:00
Tom Lane 1fc698cf14 Suppress parallel build in interfaces/ecpg/preproc/.
This is to see if it will stop intermittent build failures on buildfarm
member okapi.  We know that gmake 3.82 has some problems with sometimes
not honoring dependencies in parallel builds, and it seems likely that
this is more of the same.  Since the vast bulk of the work in the preproc
directory is associated with creating preproc.c and then preproc.o,
parallelism buys us hardly anything here anyway.

Also, make both this .NOTPARALLEL and the one previously added in
interfaces/ecpg/Makefile be conditional on "ifeq ($(MAKE_VERSION),3.82)".
The known bug in gmake is fixed upstream and should not be present in
3.83 and up, and there's no reason to think it affects older releases.
2012-11-28 22:19:46 -05:00
Tom Lane 3c84046490 Fix assorted bugs in CREATE/DROP INDEX CONCURRENTLY.
Commit 8cb53654db, which introduced DROP
INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a poor
choice of catalog state representation.  The pg_index state for an index
that's reached the final pre-drop stage was the same as the state for an
index just created by CREATE INDEX CONCURRENTLY.  This meant that the
(necessary) change to make RelationGetIndexList ignore about-to-die indexes
also made it ignore freshly-created indexes; which is catastrophic because
the latter do need to be considered in HOT-safety decisions.  Failure to
do so leads to incorrect index entries and subsequently wrong results from
queries depending on the concurrently-created index.

To fix, add an additional boolean column "indislive" to pg_index, so that
the freshly-created and about-to-die states can be distinguished.  (This
change obviously is only possible in HEAD.  This patch will need to be
back-patched, but in 9.2 we'll use a kluge consisting of overloading the
formerly-impossible state of indisvalid = true and indisready = false.)

In addition, change CREATE/DROP INDEX CONCURRENTLY so that the pg_index
flag changes they make without exclusive lock on the index are made via
heap_inplace_update() rather than a normal transactional update.  The
latter is not very safe because moving the pg_index tuple could result in
concurrent SnapshotNow scans finding it twice or not at all, thus possibly
resulting in index corruption.  This is a pre-existing bug in CREATE INDEX
CONCURRENTLY, which was copied into the DROP code.

In addition, fix various places in the code that ought to check to make
sure that the indexes they are manipulating are valid and/or ready as
appropriate.  These represent bugs that have existed since 8.2, since
a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid
index behind, and we ought not try to do anything that might fail with
such an index.

Also fix RelationReloadIndexInfo to ensure it copies all the pg_index
columns that are allowed to change after initial creation.  Previously we
could have been left with stale values of some fields in an index relcache
entry.  It's not clear whether this actually had any user-visible
consequences, but it's at least a bug waiting to happen.

In addition, do some code and docs review for DROP INDEX CONCURRENTLY;
some cosmetic code cleanup but mostly addition and revision of comments.

This will need to be back-patched, but in a noticeably different form,
so I'm committing it to HEAD before working on the back-patch.

Problem reported by Amit Kapila, diagnosis by Pavan Deolassee,
fix by Tom Lane and Andres Freund.
2012-11-28 21:26:01 -05:00
Alvaro Herrera 1577b46b7c Split out rmgr rm_desc functions into their own files
This is necessary (but not sufficient) to have them compilable outside
of a backend environment.
2012-11-28 13:01:15 -03:00
Heikki Linnakangas dd7353dde8 If we don't have a backup-end-location, don't claim we've reached it.
This was apparently a typo, which caused recovery to think that it
immediately reached the end of backup, and allowed the database to start
up too early.

Reported by Jeff Janes. Backpatch to 9.2, where this code was introduced.
2012-11-28 15:14:27 +02:00
Tom Lane e78d288c89 Add explicit casts in ilist.h's inline functions.
Needed to silence C++ errors, per report from Peter Eisentraut.

Andres Freund
2012-11-27 10:58:37 -05:00
Heikki Linnakangas 1f67078ea3 Add OpenTransientFile, with automatic cleanup at end-of-xact.
Files opened with BasicOpenFile or PathNameOpenFile are not automatically
cleaned up on error. That puts unnecessary burden on callers that only want
to keep the file open for a short time. There is AllocateFile, but that
returns a buffered FILE * stream, which in many cases is not the nicest API
to work with. So add function called OpenTransientFile, which returns a
unbuffered fd that's cleaned up like the FILE* returned by AllocateFile().

This plugs a few rare fd leaks in error cases:

1. copy_file() - fixed by by using OpenTransientFile instead of BasicOpenFile
2. XLogFileInit() - fixed by adding close() calls to the error cases. Can't
   use OpenTransientFile here because the fd is supposed to persist over
   transaction boundaries.
3. lo_import/lo_export - fixed by using OpenTransientFile instead of
   PathNameOpenFile.

In addition to plugging those leaks, this replaces many BasicOpenFile() calls
with OpenTransientFile() that were not leaking, because the code meticulously
closed the file on error. That wasn't strictly necessary, but IMHO it's good
for robustness.

The same leaks exist in older versions, but given the rarity of the issues,
I'm not backpatching this. Not yet, anyway - it might be good to backpatch
later, after this mechanism has had some more testing in master branch.
2012-11-27 10:25:50 +02:00
Tom Lane 532994299e Revert patch for taking fewer snapshots.
This reverts commit d573e239f0, "Take fewer
snapshots".  While that seemed like a good idea at the time, it caused
execution to use a snapshot that had been acquired before locking any of
the tables mentioned in the query.  This created user-visible anomalies
that were not present in any prior release of Postgres, as reported by
Tomas Vondra.  While this whole area could do with a redesign (since there
are related cases that have anomalies anyway), it doesn't seem likely that
any future patch would be reasonably back-patchable; and we don't want 9.2
to exhibit a behavior that's subtly unlike either past or future releases.
Hence, revert to prior code while we rethink the problem.
2012-11-26 15:55:43 -05:00
Tom Lane d3237e04ca Fix SELECT DISTINCT with index-optimized MIN/MAX on inheritance trees.
In a query such as "SELECT DISTINCT min(x) FROM tab", the DISTINCT is
pretty useless (there being only one output row), but nonetheless it
shouldn't fail.  But it could fail if "tab" is an inheritance parent,
because planagg.c's code for fixing up equivalence classes after making the
index-optimized MIN/MAX transformation wasn't prepared to find child-table
versions of the aggregate expression.  The least ugly fix seems to be
to add an option to mutate_eclass_expressions() to skip child-table
equivalence class members, which aren't used anymore at this stage of
planning so it's not really necessary to fix them.  Since child members
are ignored in many cases already, it seems plausible for
mutate_eclass_expressions() to have an option to ignore them too.

Per bug #7703 from Maxim Boguk.

Back-patch to 9.1.  Although the same code exists before that, it cannot
encounter child-table aggregates AFAICS, because the index optimization
transformation cannot succeed on inheritance trees before 9.1 (for lack
of MergeAppend).
2012-11-26 12:57:58 -05:00
Michael Meskes c50b8a4637 Applied patch by Chen Huajun <chenhj@cn.fujitsu.com> to make ecpg able to cope
with very long structs.
2012-11-23 14:39:27 +01:00
Tom Lane 455b8887cf Fix pg_resetxlog to use correct path to postmaster.pid.
Since we've already chdir'd into the data directory, the file should
be referenced as just "postmaster.pid", without prefixing the directory
path.  This is harmless in the normal case where an absolute PGDATA path
is used, but quite dangerous if a relative path is specified, since the
program might then fail to notice an active postmaster.

Reported by Hari Babu.  This got broken in my commit
eb5949d190, so patch all active versions.
2012-11-22 11:24:29 -05:00
Heikki Linnakangas 24c19e6bf9 Avoid bogus "out-of-sequence timeline ID" errors in standby-mode.
When startup process opens a WAL segment after replaying part of it, it
validates the first page on the WAL segment, even though the page it's
really interested in later in the file. As part of the validation, it checks
that the TLI on the page header is >= the TLI it saw on the last page it
read. If the segment contains a timeline switch, and we have already
replayed it, and then re-open the WAL segment (because of streaming
replication got disconnected and reconnected, for example), the TLI check
will fail when the first page is validated. Fix that by relaxing the TLI
check when re-opening a WAL segment.

Backpatch to 9.0. Earlier versions had the same code, but before standby
mode was introduced in 9.0, recovery never tried to re-read a segment after
partially replaying it.

Reported by Amit Kapila, while testing a new feature.
2012-11-22 11:44:44 +02:00
Tom Lane 27b2c6a1ef Don't launch new child processes after we've been told to shut down.
Once we've received a shutdown signal (SIGINT or SIGTERM), we should not
launch any more child processes, even if we get signals requesting such.
The normal code path for spawning backends has always understood that,
but the postmaster's infrastructure for hot standby and autovacuum didn't
get the memo.  As reported by Hari Babu in bug #7643, this could lead to
failure to shut down at all in some cases, such as when SIGINT is received
just before the startup process sends PMSIGNAL_RECOVERY_STARTED: we'd
launch a bgwriter and checkpointer, and then those processes would have no
idea that they ought to quit.  Similarly, launching a new autovacuum worker
would result in waiting till it finished before shutting down.

Also, switch the order of the code blocks in reaper() that detect startup
process crash versus shutdown termination.  Once we've sent it a signal,
we should not consider that exit(1) is surprising.  This is just a cosmetic
fix since shutdown occurs correctly anyway, but better not to log a phony
complaint about startup process crash.

Back-patch to 9.0.  Some parts of this might be applicable before that,
but given the lack of prior complaints I'm not going to worry too much
about older branches.
2012-11-21 15:19:30 -05:00
Heikki Linnakangas 5cb0e33597 Speed up operations on numeric, mostly by avoiding palloc() overhead.
In many functions, a NumericVar was initialized from an input Numeric, to be
passed as input to a calculation function. When the NumericVar is not
modified, the digits array of the NumericVar can point directly to the digits
array in the original Numeric, and we can avoid a palloc() and memcpy(). Add
init_var_from_num() function to initialize a var like that.

Remove dscale argument from get_str_from_var(), as all the callers just
passed the dscale of the variable. That means that the rounding it used to
do was not actually necessary, and get_str_from_var() no longer scribbles on
its input. That makes it safer in general, and allows us to use the new
init_var_from_num() function in e.g numeric_out().

Also modified numericvar_to_int8() to no scribble on its input either. It
creates a temporary copy to avoid that. To compensate, the callers no longer
need to create a temporary copy, so the net # of pallocs is the same, but this
is nicer.

In the passing, use a constant for the number 10 in get_str_from_var_sci(),
when calculating 10^exponent. Saves a palloc() and some cycles to convert
integer 10 to numeric.

Original patch by Kyotaro HORIGUCHI, with further changes by me. Reviewed
by Pavel Stehule.
2012-11-21 15:53:35 +02:00
Tom Lane 1f7cb5c309 Improve handling of INT_MIN / -1 and related cases.
Some platforms throw an exception for this division, rather than returning
a necessarily-overflowed result.  Since we were testing for overflow after
the fact, an exception isn't nice.  We can avoid the problem by treating
division by -1 as negation.

Add some regression tests so that we'll find out if any compilers try to
optimize away the overflow check conditions.

This ought to be back-patched, but I'm going to see what the buildfarm
reports about the regression tests first.

Per discussion with Xi Wang, though this is different from the patch he
submitted.
2012-11-19 12:24:25 -05:00
Heikki Linnakangas 644a0a6379 Fix archive_cleanup_command.
When I moved ExecuteRecoveryCommand() from xlog.c to xlogarchive.c, I didn't
realize that it's called from the checkpoint process, not the startup
process. I tried to use InRedo variable to decide whether or not to attempt
cleaning up the archive (must not do so before we have read the initial
checkpoint record), but that variable is only valid within the startup
process.

Instead, let ExecuteRecoveryCommand() always clean up the archive, and add
an explicit argument to RestoreArchivedFile() to say whether that's allowed
or not. The caller knows better.

Reported by Erik Rijkers, diagnosis by Fujii Masao. Only 9.3devel is
affected.
2012-11-19 10:14:20 +02:00
Tom Lane b6e3798f3a Limit values of archive_timeout, post_auth_delay, auth_delay.milliseconds.
The previous definitions of these GUC variables allowed them to range
up to INT_MAX, but in point of fact the underlying code would suffer
overflows or other errors with large values.  Reduce the maximum values
to something that won't misbehave.  There's no apparent value in working
harder than this, since very large delays aren't sensible for any of
these.  (Note: the risk with archive_timeout is that if we're late
checking the state, the timestamp difference it's being compared to
might overflow.  So we need some amount of slop; the choice of INT_MAX/2
is arbitrary.)

Per followup investigation of bug #7670.  Although this isn't a very
significant fix, might as well back-patch.
2012-11-18 17:15:06 -05:00
Tom Lane d038966ddb Fix syslogger to not fail when log_rotation_age exceeds 2^31 milliseconds.
We need to avoid calling WaitLatch with timeouts exceeding INT_MAX.
Fortunately a simple clamp will do the trick, since no harm is done if
the wait times out before it's really time to rotate the log file.
Per bug #7670 (probably bug #7545 is the same thing, too).

In passing, fix bogus definition of log_rotation_age's maximum value in
guc.c --- it was numerically right, but only because MINS_PER_HOUR and
SECS_PER_MINUTE have the same value.

Back-patch to 9.2.  Before that, syslogger wasn't using WaitLatch.
2012-11-18 16:16:39 -05:00
Tom Lane 14ddff44c2 Assert that WaitLatch's timeout is not more than INT_MAX milliseconds.
The behavior with larger values is unspecified by the Single Unix Spec.
It appears that BSD-derived kernels report EINVAL, although Linux does not.
If waiting for longer intervals is desired, the calling code has to do
something to limit the delay; we can't portably fix it here since "long"
may not be any wider than "int" in the first place.

Part of response to bug #7670, though this change doesn't fix that
(in fact, it converts the problem from an ERROR into an Assert failure).
No back-patch since it's just an assertion addition.
2012-11-18 15:39:51 -05:00
Tom Lane 1746ba9256 Improve check_partial_indexes() to consider join clauses in proof attempts.
Traditionally check_partial_indexes() has only looked at restriction
clauses while trying to prove partial indexes usable in queries.  However,
join clauses can also be used in some cases; mainly, that a strict operator
on "x" proves an "x IS NOT NULL" index predicate, even if the operator is
in a join clause rather than a restriction clause.  Adding this code fixes
a regression in 9.2, because previously we would take join clauses into
account when considering whether a partial index could be used in a
nestloop inner indexscan path.  9.2 doesn't handle nestloop inner
indexscans in the same way, and this consideration was overlooked in the
rewrite.  Moving the work to check_partial_indexes() is a better solution
anyway, since the proof applies whether or not we actually use the index
in that particular way, and we don't have to do it over again for each
possible outer relation.  Per report from Dave Cramer.
2012-11-15 19:29:05 -05:00
Tom Lane a235b85a0b Fix the int8 and int2 cases of (minimum possible integer) % (-1).
The correct answer for this (or any other case with arg2 = -1) is zero,
but some machines throw a floating-point exception instead of behaving
sanely.  Commit f9ac414c35 dealt with this
in int4mod, but overlooked the fact that it also happens in int8mod
(at least on my Linux x86_64 machine).  Protect int2mod as well; it's
not clear whether any machines fail there (mine does not) but since the
test is so cheap it seems better safe than sorry.  While at it, simplify
the original guard in int4mod: we need only check for arg2 == -1, we
don't need to check arg1 explicitly.

Xi Wang, with some editing by me.
2012-11-14 17:30:00 -05:00
Bruce Momjian 3bdfd9cb9e Adjust find_status for newer Linux 'nm' output format. 2012-11-13 21:08:07 -05:00
Tom Lane 273986bf0d Fix memory leaks in record_out() and record_send().
record_out() leaks memory: it fails to free the strings returned by the
per-column output functions, and also is careless about detoasted values.
This results in a query-lifespan memory leakage when returning composite
values to the client, because printtup() runs the output functions in the
query-lifespan memory context.  Fix it to handle these issues the same way
printtup() does.  Also fix a similar leakage in record_send().

(At some point we might want to try to run output functions in
shorter-lived memory contexts, so that we don't need a zero-leakage policy
for them.  But that would be a significantly more invasive patch, which
doesn't seem like material for back-patching.)

In passing, use appendStringInfoCharMacro instead of appendStringInfoChar
in the innermost data-copying loop of record_out, to try to shave a few
cycles from this function's runtime.

Per trouble report from Carlos Henrique Reimer.  Back-patch to all
supported versions.
2012-11-13 14:45:26 -05:00
Simon Riggs d9fad1076d Skip searching for subxact locks at commit.
At commit all standby locks are released
for the top-level transaction, so searching
for locks for each subtransaction is both
pointless and costly (N^2) in the presence
of many AccessExclusiveLocks.
2012-11-13 16:00:19 -03:00
Simon Riggs 68f7fe140b Clarify docs on hot standby lock release
Andres Freund and Simon Riggs
2012-11-13 15:54:01 -03:00
Tom Lane 3bbf668de9 Fix multiple problems in WAL replay.
Most of the replay functions for WAL record types that modify more than
one page failed to ensure that those pages were locked correctly to ensure
that concurrent queries could not see inconsistent page states.  This is
a hangover from coding decisions made long before Hot Standby was added,
when it was hardly necessary to acquire buffer locks during WAL replay
at all, let alone hold them for carefully-chosen periods.

The key problem was that RestoreBkpBlocks was written to hold lock on each
page restored from a full-page image for only as long as it took to update
that page.  This was guaranteed to break any WAL replay function in which
there was any update-ordering constraint between pages, because even if the
nominal order of the pages is the right one, any mixture of full-page and
non-full-page updates in the same record would result in out-of-order
updates.  Moreover, it wouldn't work for situations where there's a
requirement to maintain lock on one page while updating another.  Failure
to honor an update ordering constraint in this way is thought to be the
cause of bug #7648 from Daniel Farina: what seems to have happened there
is that a btree page being split was rewritten from a full-page image
before the new right sibling page was written, and because lock on the
original page was not maintained it was possible for hot standby queries to
try to traverse the page's right-link to the not-yet-existing sibling page.

To fix, get rid of RestoreBkpBlocks as such, and instead create a new
function RestoreBackupBlock that restores just one full-page image at a
time.  This function can be invoked by WAL replay functions at the points
where they would otherwise perform non-full-page updates; in this way, the
physical order of page updates remains the same no matter which pages are
replaced by full-page images.  We can then further adjust the logic in
individual replay functions if it is necessary to hold buffer locks
for overlapping periods.  A side benefit is that we can simplify the
handling of concurrency conflict resolution by moving that code into the
record-type-specfic functions; there's no more need to contort the code
layout to keep conflict resolution in front of the RestoreBkpBlocks call.

In connection with that, standardize on zero-based numbering rather than
one-based numbering for referencing the full-page images.  In HEAD, I
removed the macros XLR_BKP_BLOCK_1 through XLR_BKP_BLOCK_4.  They are
still there in the header files in previous branches, but are no longer
used by the code.

In addition, fix some other bugs identified in the course of making these
changes:

spgRedoAddNode could fail to update the parent downlink at all, if the
parent tuple is in the same page as either the old or new split tuple and
we're not doing a full-page image: it would get fooled by the LSN having
been advanced already.  This would result in permanent index corruption,
not just transient failure of concurrent queries.

Also, ginHeapTupleFastInsert's "merge lists" case failed to mark the old
tail page as a candidate for a full-page image; in the worst case this
could result in torn-page corruption.

heap_xlog_freeze() was inconsistent about using a cleanup lock or plain
exclusive lock: it did the former in the normal path but the latter for a
full-page image.  A plain exclusive lock seems sufficient, so change to
that.

Also, remove gistRedoPageDeleteRecord(), which has been dead code since
VACUUM FULL was rewritten.

Back-patch to 9.0, where hot standby was introduced.  Note however that 9.0
had a significantly different WAL-logging scheme for GIST index updates,
and it doesn't appear possible to make that scheme safe for concurrent hot
standby queries, because it can leave inconsistent states in the index even
between WAL records.  Given the lack of complaints from the field, we won't
work too hard on fixing that branch.
2012-11-12 22:05:53 -05:00
Heikki Linnakangas dbdf9679d7 Use correct text domain for translating errcontext() messages.
errcontext() is typically used in an error context callback function, not
within an ereport() invocation like e.g errmsg and errdetail are. That means
that the message domain that the TEXTDOMAIN magic in ereport() determines
is not the right one for the errcontext() calls. The message domain needs to
be determined by the C file containing the errcontext() call, not the file
containing the ereport() call.

Fix by turning errcontext() into a macro that passes the TEXTDOMAIN to use
for the errcontext message. "errcontext" was used in a few places as a
variable or struct field name, I had to rename those out of the way, now
that errcontext is a macro.

We've had this problem all along, but this isn't doesn't seem worth
backporting. It's a fairly minor issue, and turning errcontext from a
function to a macro requires at least a recompile of any external code that
calls errcontext().
2012-11-12 17:07:29 +02:00
Heikki Linnakangas c9d44a75d4 Silence "expression result unused" warnings in AssertVariableIsOfTypeMacro
At least clang 3.1 generates those warnings. Prepend (void) to avoid them,
like we have in AssertMacro.
2012-11-12 15:02:40 +02:00
Tom Lane 34f3b396a6 Check for stack overflow in transformSetOperationTree().
Since transformSetOperationTree() recurses, it can be driven to stack
overflow with enough UNION/INTERSECT/EXCEPT clauses in a query.  Add a
check to ensure it fails cleanly instead of crashing.  Per report from
Matthew Gerber (though it's not clear whether this is the only thing
going wrong for him).

Historical note: I think the reasoning behind not putting a check here in
the beginning was that the check in transformExpr() ought to be sufficient
to guard the whole parser.  However, because transformSetOperationTree()
recurses all the way to the bottom of the set-operation tree before doing
any analysis of the statement's expressions, that check doesn't save it.
2012-11-11 19:56:10 -05:00
Alvaro Herrera fa12cb7f02 Remove leftover LWLockRelease() call
This code was refactored in d5497b95 but an extra LWLockRelease call was
left behind.

Per report from Erik Rijkers
2012-11-09 10:19:34 -03:00
Tom Lane 3e7fdcffd6 Fix WaitLatch() to return promptly when the requested timeout expires.
If the sleep is interrupted by a signal, we must recompute the remaining
time to wait; otherwise, a steady stream of non-wait-terminating interrupts
could delay return from WaitLatch indefinitely.  This has been shown to be
a problem for the autovacuum launcher, and there may well be other places
now or in the future with similar issues.  So we'd better make the function
robust, even though this'll add at least one gettimeofday call per wait.

Back-patch to 9.2.  We might eventually need to fix 9.1 as well, but the
code is quite different there, and the usage of WaitLatch in 9.1 is so
limited that it's not clearly important to do so.

Reported and diagnosed by Jeff Janes, though I rewrote his patch rather
heavily.
2012-11-08 20:04:48 -05:00
Tom Lane dcc55dd21a Rename ResolveNew() to ReplaceVarsFromTargetList(), and tweak its API.
This function currently lacks the option to throw error if the provided
targetlist doesn't have any matching entry for a Var to be replaced.
Two of the four existing call sites would be better off with an error,
as would the usage in the pending auto-updatable-views patch, so it seems
past time to extend the API to support that.  To do so, replace the "event"
parameter (historically of type CmdType, though it was declared plain int)
with a special-purpose enum type.

It's unclear whether this function might be called by third-party code.
Since many C compilers wouldn't warn about a call site continuing to use
the old calling convention, rename the function to forcibly break any
such code that hasn't been updated.  The old name was none too well chosen
anyhow.
2012-11-08 16:52:49 -05:00
Tom Lane 75af5ae9c0 Don't trash input list structure in does_not_exist_skipping().
The trigger and rule cases need to split up the input name list, but
they mustn't corrupt the passed-in data structure, since it could be part
of a cached utility-statement parsetree.  Per bug #7641.
2012-11-08 11:34:32 -05:00
Heikki Linnakangas a9dad56441 Teach pg_basebackup and pg_receivexlog to reply to server keepalives.
Without this, the connection will be killed after timeout if
wal_sender_timeout is set in the server.

Original patch by Amit Kapila, modified by me to fit recent changes in the
code.
2012-11-08 10:28:52 +02:00
Tom Lane 9e45e03886 Fix missing inclusions.
Some platforms require including <netinet/in.h> and/or <arpa/inet.h> to
use htonl() and ntohl().  Per build failure locally.
2012-11-07 18:06:03 -05:00
Bruce Momjian aa69670e42 Add URLs to document why DLLIMPORT is needed on Windows.
Per email from Craig Ringer
2012-11-07 15:01:25 -05:00
Alvaro Herrera 4ee5c40b06 Don't try to use a unopened relation
Commit 4c9d0901 mistakenly introduced a call to
TransferPredicateLocksToHeapRelation() on an index relation that had
been closed a few lines above.  Moving up an index_open() call that's
below is enough to fix the problem.

Discovered by me while testing an unrelated patch.
2012-11-07 16:23:39 -03:00
Heikki Linnakangas add6c3179a Make the streaming replication protocol messages architecture-independent.
We used to send structs wrapped in CopyData messages, which works as long as
the client and server agree on things like endianess, timestamp format and
alignment. That's good enough for running a standby server, which has to run
on the same platform anyway, but it's useful for tools like pg_receivexlog
to work across platforms.

This breaks protocol compatibility of streaming replication, but we never
promised that to be compatible across versions, anyway.
2012-11-07 19:09:13 +02:00
Tom Lane 5ed6546cf7 Fix handling of inherited check constraints in ALTER COLUMN TYPE.
This case got broken in 8.4 by the addition of an error check that
complains if ALTER TABLE ONLY is used on a table that has children.
We do use ONLY for this situation, but it's okay because the necessary
recursion occurs at a higher level.  So we need to have a separate
flag to suppress recursion without making the error check.

Reported and patched by Pavan Deolasee, with some editorial adjustments by
me.  Back-patch to 8.4, since this is a regression of functionality that
worked in earlier branches.
2012-11-05 13:36:16 -05:00
Tom Lane ef28e05ac5 Fix bogus handling of $(X) (i.e., ".exe") in isolationtester Makefile.
I'm not sure why commit 1eb1dde049 seems
to have made this start to fail on Cygwin when it never did before ---
but nonetheless, the coding was pretty bogus, and unlike the way we
handle $(X) anywhere else.  Per buildfarm.
2012-11-01 19:48:53 -04:00
Tom Lane 19e36477b3 Limit the number of rel sets considered in consider_index_join_outer_rels.
In bug #7626, Brian Dunavant exposes a performance problem created by
commit 3b8968f25232ad09001bf35ab4cc59f5a501193e: that commit attempted to
consider *all* possible combinations of indexable join clauses, but if said
clauses join to enough different relations, there's an exponential increase
in the number of outer-relation sets considered.

In Brian's example, all the clauses come from the same equivalence class,
which means it's redundant to use more than one of them in an indexscan
anyway.  So we can prevent the problem in this class of cases (which is
probably the majority of real examples) by rejecting combinations that
would only serve to add a known-redundant clause.

But that still leaves us exposed to exponential growth of planning time
when the query has a lot of non-equivalence join clauses that are usable
with the same index.  I chose to prevent such cases by setting an upper
limit on the number of relation sets considered, equal to ten times the
number of index clauses considered so far.  (This sliding limit still
allows new relsets to be added on as we move to additional index columns,
which is probably more important than considering even more combinations of
clauses for the previous column.)  This should keep the amount of work done
roughly linear rather than exponential in the apparent query complexity.
This part of the fix is pretty ad-hoc; but without a clearer idea of
real-world cases for which this would result in markedly inferior plans,
it's hard to see how to do better.
2012-11-01 14:08:42 -04:00
Peter Eisentraut 1eb1dde049 Have make never delete intermediate files automatically
Several hacks in certain modes already thought this was a bad idea, so
just disable it globally.
2012-10-31 23:33:35 -04:00
Alvaro Herrera 2f1692d213 Fix erroneous choice of timeline variable, too 2012-10-31 17:05:55 -03:00
Alvaro Herrera 9b8dd7e8aa Fix erroneous choices of segNo variables
Commit dfda6eba (which changed segment numbers to use a single 64 bit
variable instead of log/seg) introduced a couple of bogus choices of
exactly which log segment number variable to use in each case.

This is currently pretty harmless; in one place, the bogus number was
only being used in an error message for a pretty unlikely condition
(failure to fsync a WAL segment file).  In the other, it was using a
global variable instead of the local variable; but all callsites were
passing the value of the global variable anyway.

No need to backpatch because that commit is not on earlier branches.
2012-10-31 11:05:28 -03:00
Alvaro Herrera 04f28bdb84 Fix ALTER EXTENSION / SET SCHEMA
In its original conception, it was leaving some objects into the old
schema, but without their proper pg_depend entries; this meant that the
old schema could be dropped, causing future pg_dump calls to fail on the
affected database.  This was originally reported by Jeff Frost as #6704;
there have been other complaints elsewhere that can probably be traced
to this bug.

To fix, be more consistent about altering a table's subsidiary objects
along the table itself; this requires some restructuring in how tables
are relocated when altering an extension -- hence the new
AlterTableNamespaceInternal routine which encapsulates it for both the
ALTER TABLE and the ALTER EXTENSION cases.

There was another bug lurking here, which was unmasked after fixing the
previous one: certain objects would be reached twice via the dependency
graph, and the second attempt to move them would cause the entire
operation to fail.  Per discussion, it seems the best fix for this is to
do more careful tracking of objects already moved: we now maintain a
list of moved objects, to avoid attempting to do it twice for the same
object.

Authors: Alvaro Herrera, Dimitri Fontaine
Reviewed by Tom Lane
2012-10-31 10:52:55 -03:00
Peter Eisentraut 4af3dda136 Preserve intermediate .c files in coverage mode
The introduction of the .y -> .c pattern rule causes some .c files such
as bootparse.c to be considered intermediate files in the .y -> .c -> .o
rule chain, which make would automatically delete.  But in coverage
mode, the processing tools such as genhtml need those files, so mark
them as "precious" so that make preserves them.
2012-10-28 10:35:46 -04:00
Kevin Grittner 6868ed7491 Throw error if expiring tuple is again updated or deleted.
This prevents surprising behavior when a FOR EACH ROW trigger
BEFORE UPDATE or BEFORE DELETE directly or indirectly updates or
deletes the the old row.  Prior to this patch the requested action
on the row could be silently ignored while all triggered actions
based on the occurence of the requested action could be committed.
One example of how this could happen is if the BEFORE DELETE
trigger for a "parent" row deleted "children" which had trigger
functions to update summary or status data on the parent.

This also prevents similar surprising problems if the query has a
volatile function which updates a target row while it is already
being updated.

There are related issues present in FOR UPDATE cursors and READ
COMMITTED queries which are not handled by this patch.  These
issues need further evalution to determine what change, if any, is
needed.

Where the new error messages are generated, in most cases the best
fix will be to move code from the BEFORE trigger to an AFTER
trigger.  Where this is not feasible, the trigger can avoid the
error by re-issuing the triggering statement and returning NULL.

Documentation changes will be submitted in a separate patch.

Kevin Grittner and Tom Lane with input from Florian Pflug and
Robert Haas, based on problems encountered during conversion of
Wisconsin Circuit Court trigger logic to plpgsql triggers.
2012-10-26 14:55:36 -05:00
Tom Lane 17804fa71b Prefer actual constants to pseudo-constants in equivalence class machinery.
generate_base_implied_equalities_const() should prefer plain Consts over
other em_is_const eclass members when choosing the "pivot" value that
all the other members will be equated to.  This makes it more likely that
the generated equalities will be useful in constraint-exclusion proofs.
Per report from Rushabh Lathia.
2012-10-26 14:19:34 -04:00
Tom Lane 5a39114fe7 In pg_dump, dump SEQUENCE SET items in the data not pre-data section.
Represent a sequence's current value as a separate TableDataInfo dumpable
object, so that it can be dumped within the data section of the archive
rather than in pre-data.  This fixes an undesirable inconsistency between
the meanings of "--data-only" and "--section=data", and also fixes dumping
of sequences that are marked as extension configuration tables, as per a
report from Marko Kreen back in July.  The main cost is that we do one more
SQL query per sequence, but that's probably not very meaningful in most
databases.

Back-patch to 9.1, since it has the extension configuration issue even
though not the --section switch.
2012-10-26 12:12:42 -04:00
Tom Lane bf01e34b55 Tweak genericcostestimate's fudge factor for index size.
To provide some bias against using a large index when a small one would do
as well, genericcostestimate adds a "fudge factor", which for a long time
was random_page_cost * index_pages/10000.  However, this can grow to be the
dominant term in indexscan cost estimates when the index involved is large
enough, a behavior that was never intended.  Change to a ln(1 + n/10000)
formulation, which has nearly the same behavior up to a few hundred pages
but tails off significantly thereafter.  (A log curve seems correct on
first principles, since what we're trying to account for here is index
descent costs, which are typically logarithmic.)  Per bug #7619 from Niko
Kiirala.

Possibly this change should get back-patched, but I'm hesitant to mess with
cost estimates in stable branches.
2012-10-24 16:25:40 -04:00
Tom Lane a4e8680a6c When converting a table to a view, remove its system columns.
Views should not have any pg_attribute entries for system columns.
However, we forgot to remove such entries when converting a table to a
view.  This could lead to crashes later on, if someone attempted to
reference such a column, as reported by Kohei KaiGai.

Patch in HEAD only.  This bug has been there forever, but in the back
branches we will have to defend against existing mis-converted views,
so it doesn't seem worthwhile to change the conversion code too.
2012-10-24 13:39:37 -04:00
Alvaro Herrera f4c4335a4a Add context info to OAT_POST_CREATE security hook
... and have sepgsql use it to determine whether to check permissions
during certain operations.  Indexes that are being created as a result
of REINDEX, for instance, do not need to have their permissions checked;
they were already checked when the index was created.

Author: KaiGai Kohei, slightly revised by me
2012-10-23 18:24:24 -03:00
Kevin Grittner 4c9d0901f1 Correct predicate locking for DROP INDEX CONCURRENTLY.
For the non-concurrent case there is an AccessExclusiveLock lock
on both the index and the heap at a time during which no other
process is using either, before which the index is maintained and
used for scans, and after which the index is no longer used or
maintained.  Predicate locks can safely be moved from the index to
the related heap relation under the protection of these locks.
This was done prior to the introductin of DROP INDEX CONCURRENTLY
and continues to be done for non-concurrent index drops.

For concurrent index drops, the predicate locks must be moved when
there are no index scans in progress on that index and no more can
subsequently start, and before heap inserts stop maintaining the
index.  As long as these conditions are guaranteed when the
TransferPredicateLocksToHeapRelation() function is called,
stronger locks are not needed for correctness.

Kevin Grittner based on questions by Tom Lane in reviewing the
DROP INDEX CONCURRENTLY patch and in cooperation with Andres
Freund and Simon Riggs.
2012-10-21 16:35:42 -05:00
Tom Lane edef20f6e1 Fix pg_dump's handling of DROP DATABASE commands in --clean mode.
In commit 4317e0246c, I accidentally broke
this behavior while rearranging code to ensure that --create wouldn't
affect whether a DATABASE entry gets put into archive-format output.
Thus, 9.2 would issue a DROP DATABASE command in --clean mode, which is
either useless or dangerous depending on the usage scenario.
It should not do that, and no longer does.

A bright spot is that this refactoring makes it easy to allow the
combination of --clean and --create to work sensibly, ie, emit DROP
DATABASE then CREATE DATABASE before reconnecting.  Ordinarily we'd
consider that a feature addition and not back-patch it, but it seems
silly to not include the extra couple of lines required in the 9.2
version of the code.

Per report from Guillaume Lelarge, though this is slightly more extensive
than his proposed patch.
2012-10-20 16:58:32 -04:00
Tom Lane 5d1abe64e6 Fix UtilityContainsQuery() to handle CREATE TABLE AS EXECUTE correctly.
The code seems to have been written to handle the pre-parse-analysis
representation, where an ExecuteStmt would appear directly under
CreateTableAsStmt.  But in reality the function is only run on
already-parse-analyzed statements, so there will be a Query node in
between.  We'd not noticed the bug because the function is generally
not used at all except in extended query protocol.

Per report from Robert Haas and Rushabh Lathia.
2012-10-19 18:33:45 -04:00
Tom Lane 4e32f8cd14 Fix hash_search to avoid corruption of the hash table on out-of-memory.
An out-of-memory error during expand_table() on a palloc-based hash table
would leave a partially-initialized entry in the table.  This would not be
harmful for transient hash tables, since they'd get thrown away anyway at
transaction abort.  But for long-lived hash tables, such as the relcache
hash, this would effectively corrupt the table, leading to crash or other
misbehavior later.

To fix, rearrange the order of operations so that table enlargement is
attempted before we insert a new entry, rather than after adding it
to the hash table.

Problem discovered by Hitoshi Harada, though this is a bit different
from his proposed patch.
2012-10-19 15:24:03 -04:00
Tom Lane 0d6895051a Fix ruleutils to print "INSERT INTO foo DEFAULT VALUES" correctly.
Per bug #7615 from Marko Tiikkaja.  Apparently nobody ever tried this
case before ...
2012-10-19 13:39:51 -04:00
Simon Riggs da85727565 Fix orphan on cancel of drop index concurrently.
Canceling DROP INDEX CONCURRENTLY during
wait could allow an orphaned index to be
left behind which could not be dropped.

Backpatch to 9.2

Andres Freund, tested by Abhijit Menon-Sen
2012-10-19 09:56:29 +01:00
Tom Lane 002191a1a3 Further cleanup of catcache.c ilist changes.
Remove useless duplicate initialization of bucket headers, don't use a
dlist_mutable_iter in a performance-critical path that doesn't need it,
make some other cosmetic changes for consistency's sake.
2012-10-18 19:30:43 -04:00
Tom Lane dc5aeca168 Remove unnecessary "head" arguments from some dlist/slist functions.
dlist_delete, dlist_insert_after, dlist_insert_before, slist_insert_after
do not need access to the list header, and indeed insisting on that negates
one of the main advantages of a doubly-linked list.

In consequence, revert addition of "cache_bucket" field to CatCTup.
2012-10-18 19:04:20 -04:00
Tom Lane 8f8d746478 Code review for inline-list patch.
Make foreach macros less syntactically dangerous, and fix some typos in
evidently-never-tested ones.  Add missing slist_next_node and
slist_head_node functions.  Fix broken dlist_check code.  Assorted comment
improvements.
2012-10-18 16:47:07 -04:00
Heikki Linnakangas 2a49585e2b Further tweaking of the readfile() function in pg_ctl.
Don't leak a file descriptor if the file is empty or we can't read its size.

Expect there to be a newline at the end of the last line, too. If there
isn't, ignore anything after the last newline. This makes it a tiny bit
more robust in case the file is appended to concurrently, so that we don't
return the last line if it hasn't been fully written yet. And this makes
the code a bit less obscure, anyway. Per Tom Lane's suggestion.

Backpatch to all supported branches.
2012-10-18 22:26:26 +03:00
Simon Riggs 160984c8c8 Isolation test for DROP INDEX CONCURRENTLY
for recent concurrent changes.

Abhijit Menon-Sen
2012-10-18 19:41:40 +01:00
Simon Riggs 2f0e480d02 Re-think guts of DROP INDEX CONCURRENTLY.
Concurrent behaviour was flawed when using
a two-step process, so add an additional
phase of processing to ensure concurrency
for both SELECTs and INSERT/UPDATE/DELETEs.

Backpatch to 9.2

Andres Freund, tweaked by me
2012-10-18 18:58:30 +01:00
Tom Lane 72a4231f0c Fix planning of non-strict equivalence clauses above outer joins.
If a potential equivalence clause references a variable from the nullable
side of an outer join, the planner needs to take care that derived clauses
are not pushed to below the outer join; else they may use the wrong value
for the variable.  (The problem arises only with non-strict clauses, since
if an upper clause can be proven strict then the outer join will get
simplified to a plain join.)  The planner attempted to prevent this type
of error by checking that potential equivalence clauses aren't
outerjoin-delayed as a whole, but actually we have to check each side
separately, since the two sides of the clause will get moved around
separately if it's treated as an equivalence.  Bugs of this type can be
demonstrated as far back as 7.4, even though releases before 8.3 had only
a very ad-hoc notion of equivalence clauses.

In addition, we neglected to account for the possibility that such clauses
might have nonempty nullable_relids even when not outerjoin-delayed; so the
equivalence-class machinery lacked logic to compute correct nullable_relids
values for clauses it constructs.  This oversight was harmless before 9.2
because we were only using RestrictInfo.nullable_relids for OR clauses;
but as of 9.2 it could result in pushing constructed equivalence clauses
to incorrect places.  (This accounts for bug #7604 from Bill MacArthur.)

Fix the first problem by adding a new test check_equivalence_delay() in
distribute_qual_to_rels, and fix the second one by adding code in
equivclass.c and called functions to set correct nullable_relids for
generated clauses.  Although I believe the second part of this is not
currently necessary before 9.2, I chose to back-patch it anyway, partly to
keep the logic similar across branches and partly because it seems possible
we might find other reasons why we need valid values of nullable_relids in
the older branches.

Add regression tests illustrating these problems.  In 9.0 and up, also
add test cases checking that we can push constants through outer joins,
since we've broken that optimization before and I nearly broke it again
with an overly simplistic patch for this problem.
2012-10-18 12:30:10 -04:00
Alvaro Herrera 7b583b20b1 pg_dump: Output functions deterministically sorted
Implementation idea from Tom Lane

Author: Joel Jacobson
Reviewed by Joachim Wieland
2012-10-18 12:23:27 -03:00
Simon Riggs 5ad72cee7e Revert tests for drop index concurrently. 2012-10-18 15:27:12 +01:00
Simon Riggs 4e206744dc Add isolation tests for DROP INDEX CONCURRENTLY.
Backpatch to 9.2 to ensure bugs are fixed.

Abhijit Menon-Sen
2012-10-18 13:37:09 +01:00
Tom Lane ff3f9c8de5 Close un-owned SMgrRelations at transaction end.
If an SMgrRelation is not "owned" by a relcache entry, don't allow it to
live past transaction end.  This design allows the same SMgrRelation to be
used for blind writes of multiple blocks during a transaction, but ensures
that we don't hold onto such an SMgrRelation indefinitely.  Because an
SMgrRelation typically corresponds to open file descriptors at the fd.c
level, leaving it open when there's no corresponding relcache entry can
mean that we prevent the kernel from reclaiming deleted disk space.
(While CacheInvalidateSmgr messages usually fix that, there are cases
where they're not issued, such as DROP DATABASE.  We might want to add
some more sinval messaging for that, but I'd be inclined to keep this
type of logic anyway, since allowing VFDs to accumulate indefinitely
for blind-written relations doesn't seem like a good idea.)

This code replaces a previous attempt towards the same goal that proved
to be unreliable.  Back-patch to 9.1 where the previous patch was added.
2012-10-17 12:38:21 -04:00
Tom Lane 9bacf0e373 Revert "Use "transient" files for blind writes, take 2".
This reverts commit fba105b109.
That approach had problems with the smgr-level state not tracking what
we really want to happen, and with the VFD-level state not tracking the
smgr-level state very well either.  In consequence, it was still possible
to hold kernel file descriptors open for long-gone tables (as in recent
report from Tore Halset), and yet there were also cases of FDs being closed
undesirably soon.  A replacement implementation will follow.
2012-10-17 12:37:08 -04:00
Alvaro Herrera a66ee69add Embedded list interface
Provide a common implementation of embedded singly-linked and
doubly-linked lists.  "Embedded" in the sense that the nodes'
next/previous pointers exist within some larger struct; this design
choice reduces memory allocation overhead.

Most of the implementation uses inlineable functions (where supported),
for performance.

Some existing uses of both types of lists have been converted to the new
code, for demonstration purposes.  Other uses can (and probably will) be
converted in the future.  Since dllist.c is unused after this conversion,
it has been removed.

Author: Andres Freund
Some tweaks by me
Reviewed by Tom Lane, Peter Geoghegan
2012-10-17 11:31:20 -03:00
Bruce Momjian 22cc3b35f4 When outputting the session id in log_line_prefix (%c) or in CSV log
output mode, cause the hex digits after the period to always be at least
four hex digits, with zero-padding.
2012-10-16 12:37:59 -04:00
Tom Lane b72bd3d1c6 alter_generic regression test cannot run concurrently with privileges test.
... because the latter plays games with the privileges for language SQL.
It looks like running alter_generic in parallel with "misc" is OK though.

Also, adjust serial_schedule to maintain the same test ordering (up to
parallelism) as parallel_schedule.
2012-10-15 12:18:52 -04:00
Heikki Linnakangas 7d3ed5ae78 Fix typo in comment.
Fujii Masao
2012-10-15 13:01:31 +03:00
Heikki Linnakangas ff6c78c480 Remove comment that is no longer true.
AddToDataDirLockFile() supports out-of-order updates of the lockfile
nowadays.
2012-10-15 11:03:39 +03:00
Heikki Linnakangas 5c89684e08 Fix race condition in pg_ctl reading postmaster.pid.
If postmaster changed postmaster.pid while pg_ctl was reading it, pg_ctl
could overrun the buffer it allocated for the file. Fix by reading the
whole file to memory with one read() call.

initdb contains an identical copy of the readfile() function, but the files
that initdb reads are static, not modified concurrently. Nevertheless, add
a simple bounds-check there, if only to silence static analysis tools.

Per report from Dave Vitek. Backpatch to all supported branches.
2012-10-15 10:36:32 +03:00
Tom Lane e81e8f9342 Split up process latch initialization for more-fail-soft behavior.
In the previous coding, new backend processes would attempt to create their
self-pipe during the OwnLatch call in InitProcess.  However, pipe creation
could fail if the kernel is short of resources; and the system does not
recover gracefully from a FATAL error right there, since we have armed the
dead-man switch for this process and not yet set up the on_shmem_exit
callback that would disarm it.  The postmaster then forces an unnecessary
database-wide crash and restart, as reported by Sean Chittenden.

There are various ways we could rearrange the code to fix this, but the
simplest and sanest seems to be to split out creation of the self-pipe into
a new function InitializeLatchSupport, which must be called from a place
where failure is allowed.  For most processes that gets called in
InitProcess or InitAuxiliaryProcess, but processes that don't call either
but still use latches need their own calls.

Back-patch to 9.1, which has only a part of the latch logic that 9.2 and
HEAD have, but nonetheless includes this bug.
2012-10-14 22:59:56 -04:00
Tom Lane 8b728e5c6e Fix oversight in new code for printing rangetable aliases.
In commit 11e131854f, I missed the case of
a CTE RTE that doesn't have a user-defined alias, but does have an
alias assigned by set_rtable_names().  Per report from Peter Eisentraut.

While at it, refactor slightly to reduce code duplication.
2012-10-12 16:14:43 -04:00
Bruce Momjian 49ec613201 In our source code, make a copy of getopt's 'optarg' string arguments,
rather than just storing a pointer.
2012-10-12 13:35:43 -04:00
Tom Lane a29f7ed554 Get rid of COERCE_DONTCARE.
We don't need this hack any more.
2012-10-12 13:35:00 -04:00
Tom Lane 71e58dcfb9 Make equal() ignore CoercionForm fields for better planning with casts.
This change ensures that the planner will see implicit and explicit casts
as equivalent for all purposes, except in the minority of cases where
there's actually a semantic difference (as reflected by having a 3-argument
cast function).  In particular, this fixes cases where the EquivalenceClass
machinery failed to consider two references to a varchar column as
equivalent if one was implicitly cast to text but the other was explicitly
cast to text, as seen in bug #7598 from Vaclav Juza.  We have had similar
bugs before in other parts of the planner, so I think it's time to fix this
problem at the core instead of continuing to band-aid around it.

Remove set_coercionform_dontcare(), which represents the band-aid
previously in use for allowing matching of index and constraint expressions
with inconsistent cast labeling.  (We can probably get rid of
COERCE_DONTCARE altogether, but I don't think removing that enum value in
back branches would be wise; it's possible there's third party code
referring to it.)

Back-patch to 9.2.  We could go back further, and might want to once this
has been tested more; but for the moment I won't risk destabilizing plan
choices in long-since-stable branches.
2012-10-12 12:11:22 -04:00
Andrew Dunstan e583ffe947 Unbreak MSVC builds after recent Makefile refactoring.
Based on a suggestion by Peter Eisentraut.
2012-10-11 12:36:42 -04:00
Tom Lane 4816d2ea32 Fix cross-type case in partial row matching for hashed subplans.
When hashing a subplan like "WHERE (a, b) NOT IN (SELECT x, y FROM ...)",
findPartialMatch() attempted to match rows using the hashtable's internal
equality operators, which of course are for x and y's datatypes.  What we
need to use are the potentially cross-type operators for a=x, b=y, etc.
Failure to do that leads to wrong answers or even crashes.  The scope for
problems is limited to cases where we have different types with compatible
hash functions (else we'd not be using a hashed subplan), but for example
int4 vs int8 can cause the problem.

Per bug #7597 from Bo Jensen.  This has been wrong since the hashed-subplan
code was written, so patch all the way back.
2012-10-11 12:22:13 -04:00
Heikki Linnakangas 6f60fdd701 Improve replication connection timeouts.
Rename replication_timeout to wal_sender_timeout, and add a new setting
called wal_receiver_timeout that does the same at the walreceiver side.
There was previously no timeout in walreceiver, so if the network went down,
for example, the walreceiver could take a long time to notice that the
connection was lost. Now with the two settings, both sides of a replication
connection will detect a broken connection similarly.

It is no longer necessary to manually set wal_receiver_status_interval to
a value smaller than the timeout. Both wal sender and receiver now
automatically send a "ping" message if more than 1/2 of the configured
timeout has elapsed, and it hasn't received any messages from the other end.

Amit Kapila, heavily edited by me.
2012-10-11 17:48:08 +03:00
Peter Eisentraut 8521d13194 Refactor flex and bison make rules
Numerous flex and bison make rules have appeared in the source tree
over time, and they are all virtually identical, so we can replace
them by pattern rules with some variables for customization.

Users of pgxs will also be able to benefit from this.
2012-10-11 06:57:04 -04:00
Peter Eisentraut ab112068b6 Remove _FORTIFY_SOURCE
Apparently, on some glibc versions this causes warnings when
optimization is not enabled.

Altogether, there appear to be too many incompatibilities surrounding
this.
2012-10-10 21:42:38 -04:00
Tom Lane 864db11683 Update obsolete comment.
We no longer use GetNewOidWithIndex on pg_largeobject; rather,
pg_largeobject_metadata's regular OID column is considered the repository
of OIDs for large objects.  The special functionality is still needed for
TOAST tables however.
2012-10-10 17:04:37 -04:00
Tom Lane a80889a735 Set procost to 10 for each of the pg_foo_is_visible() functions.
The idea here is to make sure the planner will evaluate these functions
last not first among the filter conditions in psql pattern search and
tab-completion queries.  We've discussed this several times, and there
was consensus to do it back in August, but we didn't want to do it just
before a release.  Now seems like a safer time.

No catversion bump, since this catalog change doesn't create a backend
incompatibility nor any regression test result changes.
2012-10-10 12:19:25 -04:00
Tom Lane 3f88fa971a Fix PGXS support for building loadable modules on AIX.
Building a shlib on AIX requires use of the mkldexport.sh script, but we
failed to install that, preventing its use from non-source-tree contexts.
Also, Makefile.aix had the wrong idea about where to find the installed
copy of the postgres.imp symbol file used by AIX.

Per report from John Pierce.  Patch all the way back, since this has been
broken since the beginning of PGXS.
2012-10-09 21:04:06 -04:00
Tom Lane 7e0cce0265 Remove unnecessary overhead in backend's large-object operations.
Do read/write permissions checks at most once per large object descriptor,
not once per lo_read or lo_write call as before.  The repeated tests were
quite useless in the read case since the snapshot-based tests were
guaranteed to produce the same answer every time.  In the write case,
the extra tests could in principle detect revocation of write privileges
after a series of writes has started --- but there's a race condition there
anyway, since we'd check privileges before performing and certainly before
committing the write.  So there's no real advantage to checking every
single time, and we might as well redefine it as "only check the first
time".

On the same reasoning, remove the LargeObjectExists checks in inv_write
and inv_truncate.  We already checked existence when the descriptor was
opened, and checking again doesn't provide any real increment of safety
that would justify the cost.
2012-10-09 16:38:00 -04:00
Heikki Linnakangas 2d8c81ac86 Fix silly bug in previous refactoring.
I extracted the refactoring patch from a larger patch that contained other
changes too, but missed one unintentional change and didn't test enough...
2012-10-09 19:33:12 +03:00
Heikki Linnakangas ff8f160bf4 Put the logic to wait for WAL in standby mode to a separate function.
This is just refactoring with no user-visible effect, to make the code more
readable.
2012-10-09 19:20:17 +03:00
Alvaro Herrera f46baf601d Rename USE_INLINE to PG_USE_INLINE
The former name was too likely to conflict with symbols from external
headers; and, as seen in recent buildfarm failures in member spoonbill,
it has now happened at least in plpython.
2012-10-09 11:17:33 -03:00
Heikki Linnakangas 0b77aebabf Remove stray newline in comment. 2012-10-09 13:06:48 +03:00
Tom Lane bc433317ae Fix lo_import and lo_export to return useful error messages more often.
I found that these functions tend to return -1 while leaving an empty error
message string in the PGconn, if they suffer some kind of I/O error on the
file.  The reason is that lo_close, which thinks it's executed a perfectly
fine SQL command, clears the errorMessage.  The minimum-change workaround
is to reorder operations here so that we don't fill the errorMessage until
after lo_close.
2012-10-08 21:52:34 -04:00
Tom Lane f52c5165e1 Fix lo_export usage in example programs.
lo_export returns -1, not zero, on failure.
2012-10-08 21:19:54 -04:00
Tom Lane 0e924c007d Fix lo_read, lo_write, lo_truncate to cope with "size_t" length parameters.
libpq defines these functions as accepting "size_t" lengths ... but the
underlying backend functions expect signed int32 length parameters, and so
will misinterpret any value exceeding INT_MAX.  Fix the libpq side to throw
error rather than possibly doing something unexpected.

This is a bug of long standing, but I doubt it's worth back-patching.  The
problem is really pretty academic anyway with lo_read/lo_write, since any
caller expecting sane behavior would have to have provided a multi-gigabyte
buffer.  It's slightly more pressing with lo_truncate, but still we haven't
supported large objects over 2GB until now.
2012-10-08 21:19:53 -04:00
Peter Eisentraut b6d4522296 Remove generation of repl_gram.h
It was apparently never necessary.
2012-10-08 20:36:46 -04:00
Tom Lane 26fe56481c Code review for 64-bit-large-object patch.
Fix broken-on-bigendian-machines byte-swapping functions, add missed update
of alternate regression expected file, improve error reporting, remove some
unnecessary code, sync testlo64.c with current testlo.c (it seems to have
been cloned from a very old copy of that), assorted cosmetic improvements.
2012-10-08 18:24:32 -04:00
Alvaro Herrera 878daf2e72 Fix thinko in previous commit
Since postgres.h includes palloc.h, definitions that affect the latter
must be present before the former is included.

Per buildfarm results
2012-10-08 18:33:08 -03:00
Alvaro Herrera 976fa10d20 Add support for easily declaring static inline functions
We already had those, but they forced modules to spell out the function
bodies twice.  Eliminate some duplicates we had already grown.

Extracted from a somewhat larger patch from Andres Freund.
2012-10-08 16:28:01 -03:00
Robert Haas 08c8058ce9 Add #define for UUIDOID.
Phil Sorber and Thom Brown. Reviewed by Albe Laurenz.
2012-10-08 10:15:15 -04:00
Heikki Linnakangas b28cc92d7d Say ANALYZE, not VACUUM, in error message on analyze in hot standby.
Tomonaru Katsumata
2012-10-08 14:17:27 +03:00
Heikki Linnakangas 9c0e2b9182 Fix walsender handling of postmaster shutdown, to not go into endless loop.
This bug was introduced by my patch to use the regular die/quickdie signal
handlers in walsender processes. I tried to make walsender exit at next
CHECK_FOR_INTERRUPTS() by setting ProcDiePending, but that's not enough, you
need to set InterruptPending too. On second thoght, it was not a very good
way to make walsender exit anyway, so use proc_exit(0) instead.

Also, send a CommandComplete message before exiting; that's what we did
before, and you get a nicer error message in the standby that way.

Reported by Thom Brown.
2012-10-08 13:32:14 +03:00
Tom Lane 95d035e66d Autoconfiscate selection of 64-bit int type for 64-bit large object API.
Get rid of the fundamentally indefensible assumption that "long long int"
exists and is exactly 64 bits wide on every platform Postgres runs on.
Instead let the configure script select the type to use for "pg_int64".

This is a bit of a pain in the rear since we do not want to pollute client
namespace with all the random symbols that pg_config.h defines; instead
we have to create a separate generated header file, "pg_config_ext.h".
But now that the infrastructure is there, we might have the ability to
add some other stuff that's long been wanting in this area.
2012-10-07 21:52:43 -04:00
Andrew Dunstan ea72bb8ae5 Fix typo in previous MSC commit. 2012-10-07 19:56:26 -04:00
Andrew Dunstan 33a7101281 Quiet a few MSC compiler warnings. 2012-10-07 17:31:10 -04:00
Tatsuo Ishii 7e2f8ed2b0 Fix compiling errors on Windows platform. Fix wrong usage of
INT64CONST macro. Fix lo_hton64 and lo_ntoh64 not to use int32_t and
uint32_t.
2012-10-07 23:30:31 +09:00
Tatsuo Ishii b51a65f5bf Bump up catalog vesion due to 64-bit large object API functions
addition.
2012-10-07 09:36:20 +09:00
Tatsuo Ishii 461ef73f09 Add API for 64-bit large object access. Now users can access up to
4TB large objects (standard 8KB BLCKSZ case).  For this purpose new
libpq API lo_lseek64, lo_tell64 and lo_truncate64 are added.  Also
corresponding new backend functions lo_lseek64, lo_tell64 and
lo_truncate64 are added. inv_api.c is changed to handle 64-bit
offsets.

Patch contributed by Nozomi Anzai (backend side) and Yugo Nagata
(frontend side, docs, regression tests and example program). Reviewed
by Kohei Kaigai. Committed by Tatsuo Ishii with minor editings.
2012-10-07 08:36:48 +09:00
Michael Meskes 6e41fa2e5c Fixed test for array boundary.
Instead of continuing if the next character is not an array boundary get_data()
used to continue only on finding a boundary so it was not able to read any
element after the first.
2012-10-05 17:49:17 +02:00
Heikki Linnakangas fd5942c18f Use the regular main processing loop also in walsenders.
The regular backend's main loop handles signal handling and error recovery
better than the current WAL sender command loop does. For example, if the
client hangs and a SIGTERM is received before starting streaming, the
walsender will now terminate immediately, rather than hang until the
connection times out.
2012-10-05 17:21:12 +03:00
Tom Lane 1997f34db4 getnameinfo_unix has to be taught not to insist on NI_NUMERIC flags, too.
Per testing of previous patch.
2012-10-04 22:54:18 -04:00
Peter Eisentraut 05346c131a PL/pgSQL: rename gram.y to pl_gram.y
This makes the naming inside plpgsql consistent and distinguishes the
file from the backend's gram.y file.  It will also allow easier
refactoring of the bison make rules later on.
2012-10-04 22:40:33 -04:00
Peter Eisentraut c424d0d105 Remove redundant code for getnameinfo() replacement
Our getnameinfo() replacement implementation in getaddrinfo.c failed
unless NI_NUMERICHOST and NI_NUMERICSERV were given as flags, because
it doesn't resolve host names, only numeric IPs.  But per standard,
when those flags are not given, an implementation can still degrade to
not returning host names, so this restriction is unnecessary.  When we
remove it, we can eliminate some code in postmaster.c that apparently
tried to work around that.
2012-10-04 21:45:14 -04:00
Tom Lane e1e60694b4 Make CREATE AGGREGATE complain if the initcond is invalid for the datatype.
The initial transition value is stored as a text string and not fed to the
transition type's input function until runtime (so that values such as
"now" don't get frozen at creation time).  Previously, CREATE AGGREGATE
didn't do anything with it but that, which meant that even erroneous values
would be accepted and not complained of until the aggregate is used.  This
seems unhelpful, and it's confused at least one user, as in Rhys Stewart's
recent report.  It seems worth taking a few more cycles to invoke the input
function and verify that the value is acceptable.  We can't do this if the
transition type is polymorphic, but in normal aggregates we know the actual
transition type so we can call the right input function.
2012-10-04 17:54:53 -04:00
Tom Lane 707263542e Fix parse location tracking for lists that can be empty.
The previous coding of the YYLLOC_DEFAULT macro behaved strangely for empty
productions, assigning the previous nonterminal's location as the parse
location of the result.  The usefulness of that was (at best) debatable
already, but the real problem is that in list-generating nonterminals like
	OptFooList: /* EMPTY */ { ... } | OptFooList Foo { ... } ;
the initially-identified location would get copied up, so that even a
nonempty list would be given a bogus parse location.  Document how to work
around that, and do so for OptSchemaEltList, so that the error condition
just added for CREATE SCHEMA IF NOT EXISTS produces a sane error cursor.
So far as I can tell, there are currently no other cases where the
situation arises, so we don't need other instances of this coding yet.
2012-10-04 17:15:29 -04:00
Heikki Linnakangas 1a956481ba Fix typo in comment, and reword it slightly while we're at it. 2012-10-04 10:35:48 +03:00
Tom Lane fb34e94d21 Support CREATE SCHEMA IF NOT EXISTS.
Per discussion, schema-element subcommands are not allowed together with
this option, since it's not very obvious what should happen to the element
objects.

Fabrízio de Royes Mello
2012-10-03 19:47:11 -04:00
Alvaro Herrera 994c36e01d refactor ALTER some-obj SET OWNER implementation
Remove duplicate implementation of catalog munging and miscellaneous
privilege and consistency checks.  Instead rely on already existing data
in objectaddress.c to do the work.

Author: KaiGai Kohei
Tweaked by me
Reviewed by Robert Haas
2012-10-03 18:07:46 -03:00
Tom Lane 1f91c8ca1d Avoid planner crash/Assert failure with joins to unflattened subqueries.
examine_simple_variable supposed that any RTE_SUBQUERY rel it gets pointed
at must have been planned already.  However, this isn't a safe assumption
because we must do selectivity estimation while generating indexscan paths,
and that code might look at join clauses involving a rel that the loop in
set_base_rel_sizes() hasn't reached yet.  The simplest fix is to play dumb
in such a situation, that is give up trying to extract any stats for the
Var.  This could possibly be improved by making a separate pass over the
RTE list to plan each unflattened subquery before we start the main
planning work --- but that would be pretty invasive and it doesn't seem
worth it, for now at least.  (We couldn't just break set_base_rel_sizes()
into two loops: the prescan would need to handle all subquery rels in the
query, not only those in the current join subproblem.)

This bug was introduced in commit 1cb108efb0,
although I think that subsequent changes may have exposed it more than it
was originally.  Per bug #7580 from Maxim Boguk.
2012-10-03 13:37:53 -04:00
Alvaro Herrera fe3b5eb08a REASSIGN OWNED: consider grants on tablespaces, too
Apparently this was considered in the original code (see commit
cec3b0a9) but I failed to notice that such entries would always be
skipped by the database check at the start of the loop.

Per bugs #7578 by Nikolay, #6116 by tushar.qa@gmail.com.
2012-10-03 12:30:00 -03:00
Heikki Linnakangas 7ae1815961 Return the number of rows processed when COPY is executed through SPI.
You can now get the number of rows processed by a COPY statement in a
PL/pgSQL function with "GET DIAGNOSTICS x = ROW_COUNT".

Pavel Stehule, reviewed by Amit Kapila, with some editing by me.
2012-10-03 14:38:22 +03:00
Heikki Linnakangas bc1229c832 Fix two bugs introduced in the xlog.c split.
The comment explaining the naming of timeline history files was wrong, and
the history file was not being arhived.

Pointed out by Fujii Masao.
2012-10-03 09:15:38 +03:00
Peter Eisentraut 6bd176095b Improve some LDAP authentication error messages 2012-10-02 23:25:05 -04:00
Tom Lane 09ac603c36 Work around unportable behavior of malloc(0) and realloc(NULL, 0).
On some platforms these functions return NULL, rather than the more common
practice of returning a pointer to a zero-sized block of memory.  Hack our
various wrapper functions to hide the difference by substituting a size
request of 1.  This is probably not so important for the callers, who
should never touch the block anyway if they asked for size 0 --- but it's
important for the wrapper functions themselves, which mistakenly treated
the NULL result as an out-of-memory failure.  This broke at least pg_dump
for the case of no user-defined aggregates, as per report from
Matthew Carrington.

Back-patch to 9.2 to fix the pg_dump issue.  Given the lack of previous
complaints, it seems likely that there is no live bug in previous releases,
even though some of these functions were in place before that.
2012-10-02 17:32:42 -04:00
Alvaro Herrera 2164f9a125 Refactor "ALTER some-obj SET SCHEMA" implementation
Instead of having each object type implement the catalog munging
independently, centralize knowledge about how to do it and expand the
existing table in objectaddress.c with enough data about each object
type to support this operation.

Author: KaiGai Kohei
Tweaks by me
Reviewed by Robert Haas
2012-10-02 18:13:54 -03:00
Tom Lane a563d94180 Standardize naming of malloc/realloc/strdup wrapper functions.
We had a number of variants on the theme of "malloc or die", with the
majority named like "pg_malloc", but by no means all.  Standardize on the
names pg_malloc, pg_malloc0, pg_realloc, pg_strdup.  Get rid of pg_calloc
entirely in favor of using pg_malloc0.

This is an essentially cosmetic change, so no back-patch.  (I did find
a couple of places where psql and pg_dump were using plain malloc or
strdup instead of the pg_ versions, but they don't look significant
enough to bother back-patching.)
2012-10-02 15:35:48 -04:00
Heikki Linnakangas 779f80b75d Fix typo in previous warning-silencing patch.
Fujii Masao
2012-10-02 20:00:10 +03:00
Heikki Linnakangas 2a4bbed7b8 Silence compiler warning about pointer type mismatch on some platforms.
timeval.t_sec is of type time_t, which is not always compatible with long.
I'm not sure if this was just harmless warning or a real bug, but this
fixes it, anyway.
2012-10-02 17:46:40 +03:00
Andrew Dunstan 06623df63b Allow a few seconds for Windows to catch up with a directory rename when checking pg_upgrade. 2012-10-02 10:40:57 -04:00
Heikki Linnakangas 93b6d78cf0 Add #includes needed on some platforms in the new files.
Hopefully this makes the *BSD buildfarm animals happy.
2012-10-02 17:19:52 +03:00