I failed to recognize that pg_atomic_uint64 wasn't guaranteed to be 8
byte aligned on some 32bit platforms - which it has to be on some
platforms to guarantee the desired atomicity and which we assert.
As this is all compiler specific code anyway we can just rely on
compiler specific tricks to enforce alignment.
I've been unable to find concrete documentation about the version that
introduce the sunpro alignment support, so that might need additional
guards.
I've verified that this works with gcc x86 32bit, but I don't have
access to any other 32bit environment.
Discussion: op.xpsjdkil0sbe7t@vld-kuci
Per report from Vladimir Koković.
* Don't play tricks for a more efficient pg_atomic_clear_flag() in the
generic gcc implementation. The old version was broken on gcc < 4.7
on !x86 platforms. Per buildfarm member chipmunk.
* Make usage of __atomic() fences depend on HAVE_GCC__ATOMIC_INT32_CAS
instead of HAVE_GCC__ATOMIC_INT64_CAS - there's platforms with 32bit
support that don't support 64bit atomics.
* Blindly fix two superflous #endif in generic-xlc.h
* Check for --disable-atomics in platforms but x86.
Some x86 32bit versions of gcc apparently generate references to the
nonexistant %sil register when using when using the r input
constraint, but not with the =q constraint. The latter restricts
allocations to a/b/c/d which should all work.
Several upcoming performance/scalability improvements require atomic
operations. This new API avoids the need to splatter compiler and
architecture dependent code over all the locations employing atomic
ops.
For several of the potential usages it'd be problematic to maintain
both, a atomics using implementation and one using spinlocks or
similar. In all likelihood one of the implementations would not get
tested regularly under concurrency. To avoid that scenario the new API
provides a automatic fallback of atomic operations to spinlocks. All
properties of atomic operations are maintained. This fallback -
obviously - isn't as fast as just using atomic ops, but it's not bad
either. For one of the future users the atomics ontop spinlocks
implementation was actually slightly faster than the old purely
spinlock using implementation. That's important because it reduces the
fear of regressing older platforms when improving the scalability for
new ones.
The API, loosely modeled after the C11 atomics support, currently
provides 'atomic flags' and 32 bit unsigned integers. If the platform
efficiently supports atomic 64 bit unsigned integers those are also
provided.
To implement atomics support for a platform/architecture/compiler for
a type of atomics 32bit compare and exchange needs to be
implemented. If available and more efficient native support for flags,
32 bit atomic addition, and corresponding 64 bit operations may also
be provided. Additional useful atomic operations are implemented
generically ontop of these.
The implementation for various versions of gcc, msvc and sun studio have
been tested. Additional existing stub implementations for
* Intel icc
* HUPX acc
* IBM xlc
are included but have never been tested. These will likely require
fixes based on buildfarm and user feedback.
As atomic operations also require barriers for some operations the
existing barrier support has been moved into the atomics code.
Author: Andres Freund with contributions from Oskari Saarenmaa
Reviewed-By: Amit Kapila, Robert Haas, Heikki Linnakangas and Álvaro Herrera
Discussion: CA+TgmoYBW+ux5-8Ja=Mcyuy8=VXAnVRHp3Kess6Pn3DMXAPAEA@mail.gmail.com,
20131015123303.GH5300@awork2.anarazel.de,
20131028205522.GI20248@awork2.anarazel.de