Commit Graph

12277 Commits

Author SHA1 Message Date
Bruce Momjian 1a2586c1d0 Rerun pgindent with updated typedef list. 2011-11-14 12:12:23 -05:00
Bruce Momjian cdaa45fd4b Run pgindent on range type files, per request from Tom. 2011-11-14 12:08:48 -05:00
Simon Riggs 4de82f7d7c Wakeup WALWriter as needed for asynchronous commit performance.
Previously we waited for wal_writer_delay before flushing WAL. Now
we also wake WALWriter as soon as a WAL buffer page has filled.
Significant effect observed on performance of asynchronous commits
by Robert Haas, attributed to the ability to set hint bits on tuples
earlier and so reducing contention caused by clog lookups.
2011-11-13 09:00:57 +00:00
Robert Haas aa3299f256 Avoid retaining multiple relation locks in RangeVarGetRelid.
If it turns out we've locked the wrong OID, release the old lock.  In
most cases, it's pretty harmless to retain the extra lock, but this
seems tidier and avoids using lock table slots unnecessarily.

Per discussion with Tom Lane.
2011-11-12 01:22:45 -05:00
Robert Haas 71b2b657c0 Revert removal of trace_userlocks, because userlocks aren't gone.
This reverts commit 0180bd6180.
contrib/userlock is gone, but user-level locking still exists,
and is exposed via the pg_advisory* family of functions.
2011-11-10 17:54:27 -05:00
Heikki Linnakangas 2e02280726 Fix another bug in the redo of COPY batches.
I got alignment wrong in the redo routine. Spotted by redoing the log
genereated by copy regression test.
2011-11-10 12:21:43 +02:00
Heikki Linnakangas f81648cb1e Fix bugs in the COPY heap-insert batching patch.
Forgot to call RestoreBkpBlocks() in the redo-function, as pointed out by
Simon Riggs. In redo of a regular heap insert, it's taken care of in
heap_redo(), but this new record type uses the heap2 RM, and heap2_redo()
does not take care of that for you.

Also, failed to reset the vmbuffer and all_visibile_cleared local variables
after switching to a new buffer.
2011-11-09 21:28:25 +02:00
Peter Eisentraut 3ad2c8e168 Clean gettext-files file in clean target
It used to be cleaned in maintainer-clean, but that is inconsistent
with other cleaning of NLS files in nls-global.mk, and it's also wrong
overall, because it's not part of the distribution tarball, which is
the base definition of the maintainer-clean target.
2011-11-09 20:56:19 +02:00
Robert Haas 452d1d193d Fix compiler warning. 2011-11-09 11:14:50 -05:00
Heikki Linnakangas d326d9e8ea In COPY, insert tuples to the heap in batches.
This greatly reduces the WAL volume, especially when the table is narrow.
The overhead of locking the heap page is also reduced. Reduced WAL traffic
also makes it scale a lot better, if you run multiple COPY processes at
the same time.
2011-11-09 10:54:41 +02:00
Tom Lane 57664ed25e Wrap appendrel member outputs in PlaceHolderVars in additional cases.
Add PlaceHolderVar wrappers as needed to make UNION ALL sub-select output
expressions appear non-constant and distinct from each other.  This makes
the world safe for add_child_rel_equivalences to do what it does.  Before,
it was possible for that function to add identical expressions to different
EquivalenceClasses, which logically should imply merging such ECs, which
would be wrong; or to improperly add a constant to an EquivalenceClass,
drastically changing its behavior.  Per report from Teodor Sigaev.

The only currently known consequence of this bug is "MergeAppend child's
targetlist doesn't match MergeAppend" planner failures in 9.1 and later.
I am suspicious that there may be other failure modes that could affect
older release branches; but in the absence of any hard evidence, I'll
refrain from back-patching further than 9.1.
2011-11-08 21:14:21 -05:00
Heikki Linnakangas 3b8161723c Make DatumGetInetP() unpack inet datums with a 1-byte header, and add
a new macro, DatumGetInetPP(), that does not. This brings these macros
in line with other DatumGet*P() macros.

Backpatch to 8.3, where 1-byte header varlenas were introduced.
2011-11-08 22:39:43 +02:00
Robert Haas 0e1c4b7d97 Rewrite comment for slightly greater accuracy.
Per an observation from Thom Brown that the old version contained a typo.
2011-11-08 08:11:25 -05:00
Robert Haas bbb6e559c4 Make VACUUM avoid waiting for a cleanup lock, where possible.
In a regular VACUUM, it's OK to skip pages for which a cleanup lock
isn't immediately available; the next VACUUM will deal with them.  If
we're scanning the entire relation to advance relfrozenxid, we might
need to wait, but only if there are tuples on the page that actually
require freezing.  These changes should greatly reduce the incidence
of of vacuum processes getting "stuck".

Simon Riggs and Robert Haas
2011-11-07 21:39:40 -05:00
Heikki Linnakangas ffc703a891 Fix timestamp range subdiff functions, when using float datetimes. 2011-11-07 17:38:43 +02:00
Tom Lane 039680affb Don't assume that a tuple's header size is unchanged during toasting.
This assumption can be wrong when the toaster is passed a raw on-disk
tuple, because the tuple might pre-date an ALTER TABLE ADD COLUMN operation
that added columns without rewriting the table.  In such a case the tuple's
natts value is smaller than what we expect from the tuple descriptor, and
so its t_hoff value could be smaller too.  In fact, the tuple might not
have a null bitmap at all, and yet our current opinion of it is that it
contains some trailing nulls.

In such a situation, toast_insert_or_update did the wrong thing, because
to save a few lines of code it would use the old t_hoff value as the offset
where heap_fill_tuple should start filling data.  This did not leave enough
room for the new nulls bitmap, with the result that the first few bytes of
data could be overwritten with null flag bits, as in a recent report from
Hubert Depesz Lubaczewski.

The particular case reported requires ALTER TABLE ADD COLUMN followed by
CREATE TABLE AS SELECT * FROM ... or INSERT ... SELECT * FROM ..., and
further requires that there be some out-of-line toasted fields in one of
the tuples to be copied; else we'll not reach the troublesome code.
The problem can only manifest in this form in 8.4 and later, because
before commit a77eaa6a95, CREATE TABLE AS or
INSERT/SELECT wouldn't result in raw disk tuples getting passed directly
to heap_insert --- there would always have been at least a junkfilter in
between, and that would reconstitute the tuple header with an up-to-date
t_natts and hence t_hoff.  But I'm backpatching the tuptoaster change all
the way anyway, because I'm not convinced there are no older code paths
that present a similar risk.
2011-11-04 23:22:50 -04:00
Simon Riggs a030bfa6e4 Move user functions related to WAL into xlogfuncs.c 2011-11-04 09:37:17 +00:00
Tom Lane 515e813543 Fix inline_set_returning_function() to allow multiple OUT parameters.
inline_set_returning_function failed to distinguish functions returning
generic RECORD (which require a column list in the RTE, as well as run-time
type checking) from those with multiple OUT parameters (which do not).
This prevented inlining from happening.  Per complaint from Jay Levitt.
Back-patch to 8.4 where this capability was introduced.
2011-11-03 17:54:11 -04:00
Andrew Dunstan 94cd0f1ad8 Do not treat a superuser as a member of every role for HBA purposes.
This makes it possible to use reject lines with group roles.

Andrew Dunstan, reviewd by Robert Haas.
2011-11-03 12:45:02 -04:00
Heikki Linnakangas 4429f6a9e3 Support range data types.
Selectivity estimation functions are missing for some range type operators,
which is a TODO.

Jeff Davis
2011-11-03 13:42:15 +02:00
Tom Lane 7e3bf99baa Fix handling of PlaceHolderVars in nestloop parameter management.
If we use a PlaceHolderVar from the outer relation in an inner indexscan,
we need to reference the PlaceHolderVar as such as the value to be passed
in from the outer relation.  The previous code effectively tried to
reconstruct the PHV from its component expression, which doesn't work since
(a) the Vars therein aren't necessarily bubbled up far enough, and (b) it
would be the wrong semantics anyway because of the possibility that the PHV
is supposed to have gone to null at some point before the current join.
Point (a) led to "variable not found in subplan target list" planner
errors, but point (b) would have led to silently wrong answers.
Per report from Roger Niederland.
2011-11-03 00:50:58 -04:00
Tom Lane 1a77f8b63d Avoid scanning nulls at the beginning of a btree index scan.
If we have an inequality key that constrains the other end of the index,
it doesn't directly help us in doing the initial positioning ... but it
does imply a NOT NULL constraint on the index column.  If the index stores
nulls at this end, we can use the implied NOT NULL condition for initial
positioning, just as if it had been stated explicitly.  This avoids wasting
time when there are a lot of nulls in the column.  This is the reverse of
the examples given in bugs #6278 and #6283, which were about failing to
stop early when we encounter nulls at the end of the indexscan.
2011-11-02 19:35:48 -04:00
Tom Lane 882368e854 Fix btree stop-at-nulls logic properly.
As pointed out by Naoya Anzai, my previous try at this was a few bricks
shy of a load, because I had forgotten that the initial-positioning logic
might not try to skip over nulls at the end of the index the scan will
start from.  We ought to fix that, because it represents an unnecessary
inefficiency, but first let's get the scan-stop logic back to a safe
state.  With this patch, we preserve the performance benefit requested
in bug #6278 for the case of scanning forward into NULLs (in a NULLS
LAST index), but the reverse case of scanning backward across NULLs
when there's no suitable initial-positioning qual is still inefficient.
2011-11-02 17:53:49 -04:00
Simon Riggs 750f70b0fe Update more comments about checkpoints being done by bgwriter 2011-11-02 17:15:35 +00:00
Simon Riggs 18fb9d8d21 Reduce checkpoints and WAL traffic on low activity database server
Previously, we skipped a checkpoint if no WAL had been written since
last checkpoint, though this does not appear in user documentation.
As of now, we skip a checkpoint until we have written at least one
enough WAL to switch the next WAL file. This greatly reduces the
level of activity and number of WAL messages generated by a very
low activity server. This is safe because the purpose of a checkpoint
is to act as a starting place for a recovery, in case of crash.
This patch maintains minimal WAL volume for replay in case of crash,
thus maintaining very low crash recovery time.
2011-11-02 15:26:33 +00:00
Simon Riggs 9aceb6ab3c Refactor xlog.c to create src/backend/postmaster/startup.c
Startup process now has its own dedicated file, just like all other
special/background processes. Reduces role and size of xlog.c
2011-11-02 14:25:01 +00:00
Simon Riggs 86e3364899 Derive oldestActiveXid at correct time for Hot Standby.
There was a timing window between when oldestActiveXid was derived
and when it should have been derived that only shows itself under
heavy load. Move code around to ensure correct timing of derivation.
No change to StartupSUBTRANS() code, which is where this failed.

Bug report by Chris Redekop
2011-11-02 08:54:56 +00:00
Simon Riggs 10b7c686e5 Start Hot Standby faster when initial snapshot is incomplete.
If the initial snapshot had overflowed then we can start whenever
the latest snapshot is empty, not overflowed or as we did already,
start when the xmin on primary was higher than xmax of our starting
snapshot, which proves we have full snapshot data.

Bug report by Chris Redekop
2011-11-02 08:47:43 +00:00
Simon Riggs 2296e62a32 Remove spurious entry from missed catch while patch juggling 2011-11-02 08:37:52 +00:00
Simon Riggs f8409b39d1 Fix timing of Startup CLOG and MultiXact during Hot Standby
Patch by me, bug report by Chris Redekop, analysis by Florian Pflug
2011-11-02 08:07:44 +00:00
Robert Haas c2891b46a4 Initialize myProcLocks queues just once, at postmaster startup.
In assert-enabled builds, we assert during the shutdown sequence that
the queues have been properly emptied, and during process startup that
we are inheriting empty queues.  In non-assert enabled builds, we just
save a few cycles.
2011-11-01 22:44:54 -04:00
Tom Lane 391af9f784 Preserve Var location information during flatten_join_alias_vars.
This allows us to give correct syntax error pointers when complaining
about ungrouped variables in a join query with aggregates or GROUP BY.
It's pretty much irrelevant for the planner's use of the function, though
perhaps it might aid debugging sometimes.
2011-11-01 22:13:11 -04:00
Tom Lane 08e261cbc9 Fix race condition with toast table access from a stale syscache entry.
If a tuple in a syscache contains an out-of-line toasted field, and we
try to fetch that field shortly after some other transaction has committed
an update or deletion of the tuple, there is a race condition: vacuum
could come along and remove the toast tuples before we can fetch them.
This leads to transient failures like "missing chunk number 0 for toast
value NNNNN in pg_toast_2619", as seen in recent reports from Andrew
Hammond and Tim Uckun.

The design idea of syscache is that access to stale syscache entries
should be prevented by relation-level locks, but that fails for at least
two cases where toasted fields are possible: ANALYZE updates pg_statistic
rows without locking out sessions that might want to plan queries on the
same table, and CREATE OR REPLACE FUNCTION updates pg_proc rows without
any meaningful lock at all.

The least risky fix seems to be an idea that Heikki suggested when we
were dealing with a related problem back in August: forcibly detoast any
out-of-line fields before putting a tuple into syscache in the first place.
This avoids the problem because at the time we fetch the parent tuple from
the catalog, we should be holding an MVCC snapshot that will prevent
removal of the toast tuples, even if the parent tuple is outdated
immediately after we fetch it.  (Note: I'm not convinced that this
statement holds true at every instant where we could be fetching a syscache
entry at all, but it does appear to hold true at the times where we could
fetch an entry that could have a toasted field.  We will need to be a bit
wary of adding toast tables to low-level catalogs that don't have them
already.)  An additional benefit is that subsequent uses of the syscache
entry should be faster, since they won't have to detoast the field.

Back-patch to all supported versions.  The problem is significantly harder
to reproduce in pre-9.0 releases, because of their willingness to flush
every entry in a syscache whenever the underlying catalog is vacuumed
(cf CatalogCacheFlushRelation); but there is still a window for trouble.
2011-11-01 19:49:58 -04:00
Peter Eisentraut 654e1f96b0 Clean up whitespace and indentation in parser and scanner files
These are not touched by pgindent, so clean them up a bit manually.
2011-11-01 21:51:30 +02:00
Simon Riggs f3ebaad45b Comment changes to show bgwriter no longer performs checkpoints. 2011-11-01 18:48:47 +00:00
Simon Riggs 3ba182056f Have checkpointer send stats once each processing loop.
Noted by Fujii Masao
2011-11-01 18:38:27 +00:00
Simon Riggs bf405ba8e4 Add new file for checkpointer.c 2011-11-01 18:07:29 +00:00
Simon Riggs 806a2aee37 Split work of bgwriter between 2 processes: bgwriter and checkpointer.
bgwriter is now a much less important process, responsible for page
cleaning duties only. checkpointer is now responsible for checkpoints
and so has a key role in shutdown. Later patches will correct doc
references to the now old idea that bgwriter performs checkpoints.
Has beneficial effect on performance at high write rates, but mainly
refactoring to more easily allow changes for power reduction by
simplifying previously tortuous code around required to allow page
cleaning and checkpointing to time slice in the same process.

Patch by me, Review by Dickson Guedes
2011-11-01 17:14:47 +00:00
Tom Lane 6980f817e8 Stop btree indexscans upon reaching nulls in either direction.
The existing scan-direction-sensitive tests were overly complex, and
failed to stop the scan in cases where it's perfectly legitimate to do so.
Per bug #6278 from Maksym Boguk.

Back-patch to 8.3, which is as far back as the patch applies easily.
Doesn't seem worth sweating over a relatively minor performance issue in
8.2 at this late date.  (But note that this was a performance regression
from 8.1 and before, so 8.2 is being left as an outlier.)
2011-10-31 16:40:04 -04:00
Tom Lane 6743a878a4 Support more locale-specific formatting options in cash_out().
The POSIX spec defines locale fields for controlling the ordering of the
value, sign, and currency symbol in monetary output, but cash_out only
supported a small subset of these options.  Fully implement p/n_sign_posn,
p/n_cs_precedes, and p/n_sep_by_space per spec.  Fix up cash_in so that
it will accept all these format variants.

Also, make sure that thousands_sep is only inserted to the left of the
decimal point, as required by spec.

Per bug #6144 from Eduard Kracmar and discussion of bug #6277.  This patch
includes some ideas from Alexander Lakhin's proposed patch, though it is
very different in detail.
2011-10-30 15:02:58 -04:00
Tom Lane eb5834d5af Further improvement of make_greater_string.
Make sure that it considers all the possibilities that the old code did,
instead of trying only one possibility per character position.  To keep the
runtime in bounds, instead tweak the character incrementers to not try
every possible multibyte character code.  Remove unnecessary logic to
restore the old character value on failure.  Additional comment and
formatting cleanup.
2011-10-30 12:22:11 -04:00
Robert Haas fae54e4a16 Update visibilitymap.c header comments.
Recent work on index-only scans left this somewhat out of date.
2011-10-29 14:46:59 -04:00
Tom Lane 7609239f3e Fix assorted bogosities in cash_in() and cash_out().
cash_out failed to handle multiple-byte thousands separators, as per bug
#6277 from Alexander Law.  In addition, cash_in didn't handle that either,
nor could it handle multiple-byte positive_sign.  Both routines failed to
support multiple-byte mon_decimal_point, which I did not think was worth
changing, but at least now they check for the possibility and fall back to
using '.' rather than emitting invalid output.  Also, make cash_in handle
trailing negative signs, which formerly it would reject.  Since cash_out
generates trailing negative signs whenever the locale tells it to, this
last omission represents a fail-to-reload-dumped-data bug.  IMO that
justifies patching this all the way back.
2011-10-29 14:32:06 -04:00
Robert Haas 78d523b633 Improve make_greater_string() with encoding-specific incrementers.
This infrastructure doesn't in any way guarantee that the character
we produce will sort before the one we incremented; but it does at least
make it much more likely that we'll end up with something that is a valid
character, which improves our chances.

Kyotaro Horiguchi, with various adjustments by me.
2011-10-29 14:22:20 -04:00
Robert Haas 53f1ca59b5 Allow hint bits to be set sooner for temporary and unlogged tables.
We need not wait until the commit record is durably on disk, because
in the event of a crash the page we're updating with hint bits will
be gone anyway.  Per off-list report from Heikki Linnakangas, this
can significantly degrade the performance of unlogged tables; I was
able to show a 2x speedup from this patch on a pgbench run with scale
factor 15.  In practice, this will mostly help small, heavily updated
tables, because on larger tables you're unlikely to run into the same
row again before the commit record makes it out to disk.
2011-10-28 17:08:09 -04:00
Heikki Linnakangas cbf65509bb Fix the number of lwlocks needed by the "fast path" lock patch. It needs
one lock per backend or auxiliary process - the need for a lock for each
aux processes was not accounted for in NumLWLocks(). No-one noticed,
because the three locks needed for the three aux processes fit into the
few extra lwlocks we allocate for 3rd party modules that don't call
RequestAddinLWLocks() (NUM_USER_DEFINED_LWLOCKS, 4 by default).
2011-10-27 22:39:58 +03:00
Tom Lane 3e4b3465b6 Improve planner's ability to recognize cases where an IN's RHS is unique.
If the right-hand side of a semijoin is unique, then we can treat it like a
normal join (or another way to say that is: we don't need to explicitly
unique-ify the data before doing it as a normal join).  We were recognizing
such cases when the RHS was a sub-query with appropriate DISTINCT or GROUP
BY decoration, but there's another way: if the RHS is a plain relation with
unique indexes, we can check if any of the indexes prove the output is
unique.  Most of the infrastructure for that was there already in the join
removal code, though I had to rearrange it a bit.  Per reflection about a
recent example in pgsql-performance.
2011-10-26 17:52:29 -04:00
Tom Lane 1e3b21dd5e Change FK trigger naming convention to fix self-referential FKs.
Use names like "RI_ConstraintTrigger_a_NNNN" for FK action triggers and
"RI_ConstraintTrigger_c_NNNN" for FK check triggers.  This ensures the
action trigger fires first in self-referential cases where the very same
row update fires both an action and a check trigger.  This change provides
a non-probabilistic solution for bug #6268, at the risk that it could break
client code that is making assumptions about the exact names assigned to
auto-generated FK triggers.  Hence, change this in HEAD only.  No need for
forced initdb since old triggers continue to work fine.
2011-10-26 13:19:42 -04:00
Tom Lane 58958726ff Change FK trigger creation order to better support self-referential FKs.
When a foreign-key constraint references another column of the same table,
row updates will queue both the PK's ON UPDATE action and the FK's CHECK
action in the same event.  The ON UPDATE action must execute first, else
the CHECK will check a non-final state of the row and possibly throw an
inappropriate error, as seen in bug #6268 from Roman Lytovchenko.

Now, the firing order of multiple triggers for the same event is determined
by the sort order of their pg_trigger.tgnames, and the auto-generated names
we use for FK triggers are "RI_ConstraintTrigger_NNNN" where NNNN is the
trigger OID.  So most of the time the firing order is the same as creation
order, and so rearranging the creation order fixes it.

This patch will fail to fix the problem if the OID counter wraps around or
adds a decimal digit (eg, from 99999 to 100000) while we are creating the
triggers for an FK constraint.  Given the small odds of that, and the low
usage of self-referential FKs, we'll live with that solution in the back
branches.  A better fix is to change the auto-generated names for FK
triggers, but it seems unwise to do that in stable branches because there
may be client code that depends on the naming convention.  We'll fix it
that way in HEAD in a separate patch.

Back-patch to all supported branches, since this bug has existed for a long
time.
2011-10-26 13:02:28 -04:00
Magnus Hagander a87b9ae161 Make event_source visible on all platforms
On non-windows platform, we just ignore any value set there.

Noted by Jaime Casanova
2011-10-25 22:40:58 +02:00