Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write

Daniel J Blueman

unread,

Jan 12, 2014, 10:30:02 PM1/12/14

to

On Thursday, 9 January 2014 01:10:03 UTC+8, Waiman Long wrote:
> This patch modifies the queue_write_unlock() function to use the
> new smp_store_release() function in another pending patch. It also
> removes the temporary implementation of smp_load_acquire() and
> smp_store_release() function in qrwlock.c.
>
> This patch should only be merged if PeterZ's linux-arch patch patch
> was merged.
>
> Signed-off-by: Waiman Long <Waima...@hp.com>
> Reviewed-by: Paul E. McKenney <pau...@linux.vnet.ibm.com>
> ---
> include/asm-generic/qrwlock.h | 4 +---
> kernel/locking/qrwlock.c | 34 ----------------------------------
> 2 files changed, 1 insertions(+), 37 deletions(-)
>
> diff --git a/include/asm-generic/qrwlock.h
b/include/asm-generic/qrwlock.h
> index 2b9a7b4..4d4bd04 100644
> --- a/include/asm-generic/qrwlock.h
> +++ b/include/asm-generic/qrwlock.h
> @@ -179,9 +179,7 @@ static inline void queue_write_unlock(struct
qrwlock *lock)
> /*
> * Make sure that none of the critical section will be leaked out.
> */
> - smp_mb__before_clear_bit();
> - ACCESS_ONCE(lock->cnts.writer) = 0;
> - smp_mb__after_clear_bit();
> + smp_store_release(&lock->cnts.writer, 0)

This will fail compilation, so probably needs further testing with
Peter's load_acquire/store_release barrier patches.

Thanks,
Daniel
--
Daniel J Blueman
Principal Software Engineer, Numascale
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Paul E. McKenney

unread,

Jan 12, 2014, 11:00:03 PM1/12/14

to

Peter's patch just hit -tip, so this should be esay to do. In Waiman's
defense, he does call attention to this in the commit log.

Thanx, Paul

Waiman Long

unread,

Jan 13, 2014, 11:50:05 AM1/13/14

to

Peter,

I found out that the build failure was caused by the fact that the
__native_word() macro (used internally by compiletime_assert_atomic())
allows only a size of 4 or 8 for x86-64. The data type that I used is a
byte. Is there a reason why byte and short are not considered native?

-Longman

Daniel J Blueman

unread,

Jan 13, 2014, 9:30:02 PM1/13/14

to

It seems likely it was implemented like that since there was no existing
need; long can be relied on as the largest native type, so this should
suffice and works here:

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index fe7a686..dac91d7 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -299,8 +299,8 @@ void ftrace_likely_update(struct ftrace_branch_data
*f, int val, int expect);
#endif

/* Is this type a native word size -- useful for atomic operations */
-#ifndef __native_word
-# define __native_word(t) (sizeof(t) == sizeof(int) || sizeof(t) ==
sizeof(long))
+#ifndef __native_type
+# define __native_type(t) (sizeof(t) <= sizeof(long))
#endif

/* Compile time object size, -1 for unknown */
@@ -343,8 +343,8 @@ void ftrace_likely_update(struct ftrace_branch_data
*f, int val, int expect);
_compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)

#define compiletime_assert_atomic_type(t) \
- compiletime_assert(__native_word(t), \
- "Need native word sized stores/loads for atomicity.")
+ compiletime_assert(__native_type(t), \
+ "Need native sized stores/loads for atomicity.")

/*
* Prevent the compiler from merging or refetching accesses. The compiler

Signed-off-by: Daniel J Blueman <dan...@numascale.com>

--
Daniel J Blueman
Principal Software Engineer, Numascale

Peter Zijlstra

unread,

Jan 14, 2014, 6:10:02 AM1/14/14

to

On Tue, Jan 14, 2014 at 10:28:23AM +0800, Daniel J Blueman wrote:
> >Peter,
> >
> >I found out that the build failure was caused by the fact that the
> >__native_word() macro (used internally by compiletime_assert_atomic())
> >allows only a size of 4 or 8 for x86-64. The data type that I used is a
> >byte. Is there a reason why byte and short are not considered native?
>
> It seems likely it was implemented like that since there was no existing
> need; long can be relied on as the largest native type, so this should
> suffice and works here:

There's Alphas that cannot actually atomically adres a byte; I do not
konw if Linux cares about them, but if it does, we cannot in fact rely
on this in generic primitives like this.

Waiman Long

unread,

Jan 14, 2014, 10:30:02 AM1/14/14

to

On 01/14/2014 06:03 AM, Peter Zijlstra wrote:
> On Tue, Jan 14, 2014 at 10:28:23AM +0800, Daniel J Blueman wrote:
>>> Peter,
>>>
>>> I found out that the build failure was caused by the fact that the
>>> __native_word() macro (used internally by compiletime_assert_atomic())
>>> allows only a size of 4 or 8 for x86-64. The data type that I used is a
>>> byte. Is there a reason why byte and short are not considered native?
>> It seems likely it was implemented like that since there was no existing
>> need; long can be relied on as the largest native type, so this should
>> suffice and works here:
> There's Alphas that cannot actually atomically adres a byte; I do not
> konw if Linux cares about them, but if it does, we cannot in fact rely
> on this in generic primitives like this.

Thank for the explanation.

Can we allow architectural override of __native_word() macro? Like

#ifdef __arch_native_word
#define __native_word(t) __arch_native_word(t)
#else
#define __native_word(t) (sizeof(t) == sizeof(int) || sizeof(t) ==
sizeof(long))
#endif

In this way, we can allow x86 to support byte-based atomic type while
restricting the generic macro to int and long only. I will also modify
the code to use cmpxchg() when byte is not an atomic type.

-Longman

Matt Turner

unread,

Jan 14, 2014, 12:10:02 PM1/14/14

to

On Tue, Jan 14, 2014 at 3:03 AM, Peter Zijlstra <pet...@infradead.org> wrote:
> On Tue, Jan 14, 2014 at 10:28:23AM +0800, Daniel J Blueman wrote:
>> >Peter,
>> >
>> >I found out that the build failure was caused by the fact that the
>> >__native_word() macro (used internally by compiletime_assert_atomic())
>> >allows only a size of 4 or 8 for x86-64. The data type that I used is a
>> >byte. Is there a reason why byte and short are not considered native?
>>
>> It seems likely it was implemented like that since there was no existing
>> need; long can be relied on as the largest native type, so this should
>> suffice and works here:
>
> There's Alphas that cannot actually atomically adres a byte; I do not
> konw if Linux cares about them, but if it does, we cannot in fact rely
> on this in generic primitives like this.

That's right, and thanks for the heads-up. Alpha can only address 4
and 8 bytes atomically. (LDL_L, LDQ_L, STL_C, STQ_C).

The Byte-Word extension in EV56 doesn't add new atomics, so in fact no
Alphas can address < 4 bytes atomically.

Richard Henderson

unread,

Jan 14, 2014, 1:10:03 PM1/14/14

to

On 01/14/2014 09:08 AM, Matt Turner wrote:
> On Tue, Jan 14, 2014 at 3:03 AM, Peter Zijlstra <pet...@infradead.org> wrote:
>> On Tue, Jan 14, 2014 at 10:28:23AM +0800, Daniel J Blueman wrote:
>>>> Peter,
>>>>
>>>> I found out that the build failure was caused by the fact that the
>>>> __native_word() macro (used internally by compiletime_assert_atomic())
>>>> allows only a size of 4 or 8 for x86-64. The data type that I used is a
>>>> byte. Is there a reason why byte and short are not considered native?
>>>
>>> It seems likely it was implemented like that since there was no existing
>>> need; long can be relied on as the largest native type, so this should
>>> suffice and works here:
>>
>> There's Alphas that cannot actually atomically adres a byte; I do not
>> konw if Linux cares about them, but if it does, we cannot in fact rely
>> on this in generic primitives like this.
>
> That's right, and thanks for the heads-up. Alpha can only address 4
> and 8 bytes atomically. (LDL_L, LDQ_L, STL_C, STQ_C).
>
> The Byte-Word extension in EV56 doesn't add new atomics, so in fact no
> Alphas can address < 4 bytes atomically.
>

Emulated with aligned 4 byte atomics, and masking. The same is true for arm,
ppc, mips which, depending on cpu, also lack < 4 byte atomics.

r~

Waiman Long

unread,

Jan 14, 2014, 2:10:02 PM1/14/14

to

On 01/14/2014 01:01 PM, Richard Henderson wrote:
> On 01/14/2014 09:08 AM, Matt Turner wrote:
>> On Tue, Jan 14, 2014 at 3:03 AM, Peter Zijlstra<pet...@infradead.org> wrote:
>>> On Tue, Jan 14, 2014 at 10:28:23AM +0800, Daniel J Blueman wrote:
>>>>> Peter,
>>>>>
>>>>> I found out that the build failure was caused by the fact that the
>>>>> __native_word() macro (used internally by compiletime_assert_atomic())
>>>>> allows only a size of 4 or 8 for x86-64. The data type that I used is a
>>>>> byte. Is there a reason why byte and short are not considered native?
>>>> It seems likely it was implemented like that since there was no existing
>>>> need; long can be relied on as the largest native type, so this should
>>>> suffice and works here:
>>> There's Alphas that cannot actually atomically adres a byte; I do not
>>> konw if Linux cares about them, but if it does, we cannot in fact rely
>>> on this in generic primitives like this.
>> That's right, and thanks for the heads-up. Alpha can only address 4
>> and 8 bytes atomically. (LDL_L, LDQ_L, STL_C, STQ_C).
>>
>> The Byte-Word extension in EV56 doesn't add new atomics, so in fact no
>> Alphas can address< 4 bytes atomically.
>>
> Emulated with aligned 4 byte atomics, and masking. The same is true for arm,
> ppc, mips which, depending on cpu, also lack< 4 byte atomics.
>

I would like to know if the action of writing out a byte (e.g. *byte =
0) is atomic in those architectures or is emulated by a
compiler-generated software read-modify-write.

-Longman

Peter Zijlstra

unread,

Jan 14, 2014, 3:30:02 PM1/14/14

to

On Tue, Jan 14, 2014 at 02:09:30PM -0500, Waiman Long wrote:
> I would like to know if the action of writing out a byte (e.g. *byte = 0) is
> atomic in those architectures or is emulated by a compiler-generated
> software read-modify-write.

So on Alpha pre ev56 something like:

*(volatile u8 *)foo = 0;

_Should_ cause a compile error as the hardware has to do a rmw which is
not compatible with the requirements for volatile -- that said I do not
know if a compiler will actually generate this error.

I can well imagine other load-store archs suffering similar problems,
although I'm not aware of any.

Paul E. McKenney

unread,

Jan 14, 2014, 6:50:02 PM1/14/14

to

On Tue, Jan 14, 2014 at 10:01:04AM -0800, Richard Henderson wrote:
> On 01/14/2014 09:08 AM, Matt Turner wrote:
> > On Tue, Jan 14, 2014 at 3:03 AM, Peter Zijlstra <pet...@infradead.org> wrote:
> >> On Tue, Jan 14, 2014 at 10:28:23AM +0800, Daniel J Blueman wrote:
> >>>> Peter,
> >>>>
> >>>> I found out that the build failure was caused by the fact that the
> >>>> __native_word() macro (used internally by compiletime_assert_atomic())
> >>>> allows only a size of 4 or 8 for x86-64. The data type that I used is a
> >>>> byte. Is there a reason why byte and short are not considered native?
> >>>
> >>> It seems likely it was implemented like that since there was no existing
> >>> need; long can be relied on as the largest native type, so this should
> >>> suffice and works here:
> >>
> >> There's Alphas that cannot actually atomically adres a byte; I do not
> >> konw if Linux cares about them, but if it does, we cannot in fact rely
> >> on this in generic primitives like this.
> >
> > That's right, and thanks for the heads-up. Alpha can only address 4
> > and 8 bytes atomically. (LDL_L, LDQ_L, STL_C, STQ_C).
> >
> > The Byte-Word extension in EV56 doesn't add new atomics, so in fact no
> > Alphas can address < 4 bytes atomically.
>
> Emulated with aligned 4 byte atomics, and masking. The same is true for arm,
> ppc, mips which, depending on cpu, also lack < 4 byte atomics.

Which means that Alpha should be able to similarly emulate 1-byte and
2-byte atomics, correct?

Thanx, Paul

Linus Torvalds

unread,

Jan 14, 2014, 7:30:01 PM1/14/14

to

On Wed, Jan 15, 2014 at 6:44 AM, Paul E. McKenney
<pau...@linux.vnet.ibm.com> wrote:
>
> Which means that Alpha should be able to similarly emulate 1-byte and
> 2-byte atomics, correct?

Not reasonably, no.

The ldl/stc implementation on early alpha was so broken as to be
unusable. It's not actually done in the cache, it WENT OUT ON THE BUS.
We're talking 70's style "external lock signal" kind of things like
the 8086 did for locked cycles before the advent of caches, the kind
that nobody sane has done for a long long time.

So realistically, you absolutely do not want to use those things to
emulate atomic byte/word accesses. The whole point of "load_acquire()"
and "store_release()" is that it's supposed to be cheaper than a
locked access, and can be done with just a barrier instruction or a
special instruction flag.

If you just want to do a store release, on alpha you'd want to
implement that as a full memory barrier followed by a store. It
doesn't get the advantage of a real release consistency model, but at
least it's not doing an external bus access. But you can only do that
store as a 4-byte or 8-byte store.on the older alphas (byte and word
stores work on newer ones).

Of course, it's entirely possible that nobody cares..

Linus

Paul E. McKenney

unread,

Jan 14, 2014, 9:50:02 PM1/14/14

to

On Wed, Jan 15, 2014 at 07:25:04AM +0700, Linus Torvalds wrote:
> On Wed, Jan 15, 2014 at 6:44 AM, Paul E. McKenney
> <pau...@linux.vnet.ibm.com> wrote:
> >
> > Which means that Alpha should be able to similarly emulate 1-byte and
> > 2-byte atomics, correct?
>
> Not reasonably, no.
>
> The ldl/stc implementation on early alpha was so broken as to be
> unusable. It's not actually done in the cache, it WENT OUT ON THE BUS.
> We're talking 70's style "external lock signal" kind of things like
> the 8086 did for locked cycles before the advent of caches, the kind
> that nobody sane has done for a long long time.
>
> So realistically, you absolutely do not want to use those things to
> emulate atomic byte/word accesses. The whole point of "load_acquire()"
> and "store_release()" is that it's supposed to be cheaper than a
> locked access, and can be done with just a barrier instruction or a
> special instruction flag.
>
> If you just want to do a store release, on alpha you'd want to
> implement that as a full memory barrier followed by a store. It
> doesn't get the advantage of a real release consistency model, but at
> least it's not doing an external bus access. But you can only do that
> store as a 4-byte or 8-byte store.on the older alphas (byte and word
> stores work on newer ones).
>
> Of course, it's entirely possible that nobody cares..

That would be my hope. ;-)

If nobody cares about Alpha period, it is easy. However, the last time
that I tried that approach, they sent me a URL of a wiki showing Alpha
systems still running mainline. But a slow-but-working approach for
Alpha does seem reasonable, even for those still running Linux on Alpha.

Thanx, Paul

Daniel J Blueman

unread,

Jan 14, 2014, 10:10:01 PM1/14/14

to

On 01/15/2014 07:44 AM, Paul E. McKenney wrote:
> On Tue, Jan 14, 2014 at 10:01:04AM -0800, Richard Henderson wrote:
>> On 01/14/2014 09:08 AM, Matt Turner wrote:
>>> On Tue, Jan 14, 2014 at 3:03 AM, Peter Zijlstra <pet...@infradead.org> wrote:
>>>> On Tue, Jan 14, 2014 at 10:28:23AM +0800, Daniel J Blueman wrote:
>>>>>> Peter,
>>>>>>
>>>>>> I found out that the build failure was caused by the fact that the
>>>>>> __native_word() macro (used internally by compiletime_assert_atomic())
>>>>>> allows only a size of 4 or 8 for x86-64. The data type that I used is a
>>>>>> byte. Is there a reason why byte and short are not considered native?
>>>>>
>>>>> It seems likely it was implemented like that since there was no existing
>>>>> need; long can be relied on as the largest native type, so this should
>>>>> suffice and works here:
>>>>
>>>> There's Alphas that cannot actually atomically adres a byte; I do not
>>>> konw if Linux cares about them, but if it does, we cannot in fact rely
>>>> on this in generic primitives like this.
>>>
>>> That's right, and thanks for the heads-up. Alpha can only address 4
>>> and 8 bytes atomically. (LDL_L, LDQ_L, STL_C, STQ_C).
>>>
>>> The Byte-Word extension in EV56 doesn't add new atomics, so in fact no
>>> Alphas can address < 4 bytes atomically.
>>
>> Emulated with aligned 4 byte atomics, and masking. The same is true for arm,
>> ppc, mips which, depending on cpu, also lack < 4 byte atomics.
>
> Which means that Alpha should be able to similarly emulate 1-byte and
> 2-byte atomics, correct?

If it's not possible to guarantee that GCC emits the 4-byte atomics by
using a union, then we could emit the instructions via assembly. We'd
introduce a macro to ensure lock word alignment, and this would be safe
for unsigned counting up to the packed type limit. Maybe that's just too
over-constrained though.

Thanks,
Daniel

--
Daniel J Blueman
Principal Software Engineer, Numascale

Peter Zijlstra

unread,

Jan 15, 2014, 3:10:02 AM1/15/14

to

On Tue, Jan 14, 2014 at 06:39:58PM -0800, Paul E. McKenney wrote:
> > If you just want to do a store release, on alpha you'd want to
> > implement that as a full memory barrier followed by a store. It
> > doesn't get the advantage of a real release consistency model, but at
> > least it's not doing an external bus access. But you can only do that
> > store as a 4-byte or 8-byte store.on the older alphas (byte and word
> > stores work on newer ones).
> >
> > Of course, it's entirely possible that nobody cares..
>
> That would be my hope. ;-)
>
> If nobody cares about Alpha period, it is easy. However, the last time
> that I tried that approach, they sent me a URL of a wiki showing Alpha
> systems still running mainline. But a slow-but-working approach for
> Alpha does seem reasonable, even for those still running Linux on Alpha.

Well, if they're all EV56 or later we're still good as they can actually
do what we need.

But I don't think a ll/sc implementation of the store_release can even
work, because in that case all users of the other bytes also need a
ll/sc around them but how are we to know about them?

So the only real way to allow store_release on 8/16 bit values is by
removing all Alpha support _pre_ EV56 :/

Paul E. McKenney

unread,

Jan 15, 2014, 4:00:01 PM1/15/14

to

On Wed, Jan 15, 2014 at 09:07:53AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 14, 2014 at 06:39:58PM -0800, Paul E. McKenney wrote:
> > > If you just want to do a store release, on alpha you'd want to
> > > implement that as a full memory barrier followed by a store. It
> > > doesn't get the advantage of a real release consistency model, but at
> > > least it's not doing an external bus access. But you can only do that
> > > store as a 4-byte or 8-byte store.on the older alphas (byte and word
> > > stores work on newer ones).
> > >
> > > Of course, it's entirely possible that nobody cares..
> >
> > That would be my hope. ;-)
> >
> > If nobody cares about Alpha period, it is easy. However, the last time
> > that I tried that approach, they sent me a URL of a wiki showing Alpha
> > systems still running mainline. But a slow-but-working approach for
> > Alpha does seem reasonable, even for those still running Linux on Alpha.
>
> Well, if they're all EV56 or later we're still good as they can actually
> do what we need.
>
> But I don't think a ll/sc implementation of the store_release can even
> work, because in that case all users of the other bytes also need a
> ll/sc around them but how are we to know about them?
>
> So the only real way to allow store_release on 8/16 bit values is by
> removing all Alpha support _pre_ EV56 :/

Fair point... We could demand Alpha-specific alignment, but that would
get really ugly really quickly.

But we did drop support for SMP i386 quite some time ago, so perhaps
it is time to drop support for SMP Alpha pre-EV56.

Thanx, Paul

Peter Zijlstra

unread,

Jan 15, 2014, 6:30:01 PM1/15/14

to

On Wed, Jan 15, 2014 at 12:53:46PM -0800, Paul E. McKenney wrote:
> But we did drop support for SMP i386 quite some time ago, so perhaps
> it is time to drop support for SMP Alpha pre-EV56.

So while the primitive is called smp_store_release() the !SMP variant
still does:

*(volatile __type *) = ptr;

which should not compile on any Alpha pre EV56, SMP or no for __type ==
u8.

Peter Zijlstra

unread,

Jan 16, 2014, 5:40:02 AM1/16/14

to

On Thu, Jan 16, 2014 at 06:39:23AM +0700, Linus Torvalds wrote:

> On Jan 16, 2014 6:22 AM, "Peter Zijlstra" <pet...@infradead.org> wrote:
> >
> > So while the primitive is called smp_store_release() the !SMP variant
> > still does:
> >
> > *(volatile __type *) = ptr;
> >
> > which should not compile on any Alpha pre EV56, SMP or no for __type ==
> > u8.
>

> I'm not sure where you get that "should not compile" theory from.
>
> I'm pretty sure it will compile just fine. It will just generate the same
> standard read-modify-write sequence (and not using the ldl/stc sequence
> either). Do you have any actual reason to believe it won't, apart from your
> theoretical wishes of how the world should work?

No, I earlier even said it probably would compile. My usage of 'should'
comes from how we've 'defined' volatile/ACCESS_ONCE in
Documentation/memory-barriers.txt. According to those constraints the
rmw cycle is not proper code.

Paul E. McKenney

unread,

Jan 18, 2014, 5:10:01 AM1/18/14

to

On Thu, Jan 16, 2014 at 11:36:59AM +0100, Peter Zijlstra wrote:
> On Thu, Jan 16, 2014 at 06:39:23AM +0700, Linus Torvalds wrote:
> > On Jan 16, 2014 6:22 AM, "Peter Zijlstra" <pet...@infradead.org> wrote:
> > >
> > > So while the primitive is called smp_store_release() the !SMP variant
> > > still does:
> > >
> > > *(volatile __type *) = ptr;
> > >
> > > which should not compile on any Alpha pre EV56, SMP or no for __type ==
> > > u8.
> >
> > I'm not sure where you get that "should not compile" theory from.
> >
> > I'm pretty sure it will compile just fine. It will just generate the same
> > standard read-modify-write sequence (and not using the ldl/stc sequence
> > either). Do you have any actual reason to believe it won't, apart from your
> > theoretical wishes of how the world should work?
>
> No, I earlier even said it probably would compile. My usage of 'should'
> comes from how we've 'defined' volatile/ACCESS_ONCE in
> Documentation/memory-barriers.txt. According to those constraints the
> rmw cycle is not proper code.

OK, I will bite... Aside from fine-grained code timing, what code could
you write to tell the difference between a real one-byte store and an
RMW emulating that store?

Thanx, Paul

Peter Zijlstra

unread,

Jan 18, 2014, 6:40:02 AM1/18/14

to

On Sat, Jan 18, 2014 at 02:01:05AM -0800, Paul E. McKenney wrote:
> OK, I will bite... Aside from fine-grained code timing, what code could
> you write to tell the difference between a real one-byte store and an
> RMW emulating that store?

Why isn't fine-grained code timing an issue? I'm sure Alpha people will
love it when their machine magically keels over every so often.

Suppose we have two bytes in a word that get concurrent updates:

union {
struct {
u8 a;
u8 b;
};
int word;
} ponies = { .word = 0, };

then two threads concurrently do:

CPU0: CPU1:

ponies.a = 5 ponies.b = 10

At which point you'd expect: a == 5 && b == 10

However, with a rmw you could end up like:

load r, ponies.word
load r, ponies.word
and r, ~0xFF
or r, 5
store ponies.word, r
and r, ~0xFF00
or r, 10 << 8
store ponies.word, r

which gives: a == 0 && b == 10

The same can be had on a single CPU if you make the second RMW an
interrupt.

In fact, we recently had such a RMW issue on PPC64 although from a
slightly different angle, but we managed to hit it quite consistently.
See commit ba1f14fbe7096.

The thing is, if we allow the above RMW 'atomic' store, we have to be
_very_ careful that there cannot be such overlapping stores, otherwise
things will go BOOM!

However, if we already have to make sure there's no overlapping stores,
we might as well write a wide store and not allow the narrow stores to
begin with, to force people to think about the issue.

Paul E. McKenney

unread,

Jan 18, 2014, 7:30:02 AM1/18/14

to

Ah, I was assuming atomic rmw, which for Alpha would be implemented using
the LL and SC instructions. Yes, lots of overhead, but if the CPU
designers chose not to provide a load/store byte...

Thanx, Paul

Peter Zijlstra

unread,

Jan 18, 2014, 7:50:02 AM1/18/14

to

I don't see how ll/sc will help any. Suppose we do the a store as
smp_store_release() using ll/sc but the b store is unaware and doesn't
do an ll/sc.

Then we're still up shit creek without no paddle.

Whatever you're going to do, you need to be intimately aware of what the
other bits in your word are doing.

Paul E. McKenney

unread,

Jan 18, 2014, 4:30:01 PM1/18/14

to

Yes, this requires that -all- updates to the fields in the machine word
in question use atomic rmw. Which would not be pretty from a core-code
perspective. Hence my suggestion of ceasing Linux-kernel support for
DEC Alpha CPUs that don't support byte operations. Also need 16-bit
operations as well, of course...

Thanx, Paul

Linus Torvalds

unread,

Jan 18, 2014, 8:00:01 PM1/18/14

to

On Sat, Jan 18, 2014 at 1:22 PM, Paul E. McKenney
<pau...@linux.vnet.ibm.com> wrote:
>
> Yes, this requires that -all- updates to the fields in the machine word
> in question use atomic rmw. Which would not be pretty from a core-code
> perspective. Hence my suggestion of ceasing Linux-kernel support for
> DEC Alpha CPUs that don't support byte operations. Also need 16-bit
> operations as well, of course...

I'm not seeing this.

Why the hell would you have byte- or halfword-sized versions of the
store_release or load_acquire things on alpha anyway?

What it means is that data structures that do locking or atomics need
to be "int" or "long" on alpha. That has always been true. What do
you claim has changed?

Linus

Paul E. McKenney

unread,

Jan 19, 2014, 3:10:01 AM1/19/14

to

On Sat, Jan 18, 2014 at 04:57:05PM -0800, Linus Torvalds wrote:
> On Sat, Jan 18, 2014 at 1:22 PM, Paul E. McKenney
> <pau...@linux.vnet.ibm.com> wrote:
> >
> > Yes, this requires that -all- updates to the fields in the machine word
> > in question use atomic rmw. Which would not be pretty from a core-code
> > perspective. Hence my suggestion of ceasing Linux-kernel support for
> > DEC Alpha CPUs that don't support byte operations. Also need 16-bit
> > operations as well, of course...
>
> I'm not seeing this.
>
> Why the hell would you have byte- or halfword-sized versions of the
> store_release or load_acquire things on alpha anyway?
>
> What it means is that data structures that do locking or atomics need
> to be "int" or "long" on alpha. That has always been true. What do
> you claim has changed?

OK, another approach would be to never add "select ARCH_USE_QUEUE_RWLOCK"
on Alpha, at least if the queued rwlocks really do want to atomically
manipulate bytes. After all, the Alpha systems that I know about don't
have enough CPUs to make queued rwlocks necessary anyway.

Much simpler solution!

Is this what you were getting at, or am I missing your point?

Thanx, Paul

Linus Torvalds

unread,

Jan 19, 2014, 3:00:02 PM1/19/14

to

On Sun, Jan 19, 2014 at 12:04 AM, Paul E. McKenney
<pau...@linux.vnet.ibm.com> wrote:
>
> OK, another approach would be to never add "select ARCH_USE_QUEUE_RWLOCK"
> on Alpha, at least if the queued rwlocks really do want to atomically
> manipulate bytes. After all, the Alpha systems that I know about don't
> have enough CPUs to make queued rwlocks necessary anyway.
>
> Much simpler solution!
>
> Is this what you were getting at, or am I missing your point?

You're missing something.

Just make the "writer" field be an "int" on little-endian archiectures
(like alpha).

There is no reason for that field to be a "char" to begin with, as far
as I can tell, since the padding of the structure means that it
doesn't save any space. But even if that wasn't true, we could make an
arch-specific type for "minimum type for locking".

So my *point* was that it should be easy enough to just make sure that
any data structures used for locking have types that are appropriate
for that locking.

Linus

Paul E. McKenney

unread,

Jan 19, 2014, 8:00:01 PM1/19/14

to

On Sun, Jan 19, 2014 at 11:56:02AM -0800, Linus Torvalds wrote:
> On Sun, Jan 19, 2014 at 12:04 AM, Paul E. McKenney
> <pau...@linux.vnet.ibm.com> wrote:
> >
> > OK, another approach would be to never add "select ARCH_USE_QUEUE_RWLOCK"
> > on Alpha, at least if the queued rwlocks really do want to atomically
> > manipulate bytes. After all, the Alpha systems that I know about don't
> > have enough CPUs to make queued rwlocks necessary anyway.
> >
> > Much simpler solution!
> >
> > Is this what you were getting at, or am I missing your point?
>
> You're missing something.
>
> Just make the "writer" field be an "int" on little-endian archiectures
> (like alpha).
>
> There is no reason for that field to be a "char" to begin with, as far
> as I can tell, since the padding of the structure means that it
> doesn't save any space. But even if that wasn't true, we could make an
> arch-specific type for "minimum type for locking".

On 64-bit systems (which includes Alpha), agreed, the field can be a
32-bit portion of a 64-bit structure that is then manipulated atomically.
Many 32-bit systems need the reader and writer counts to fix in 32 bits
in order to allow things like queue_read_trylock() to correctly account
for both readers and writers.

If there was a 32-bit system running Linux that did not support byte
accesses, there would be a problem, but I don't know of any such system.

> So my *point* was that it should be easy enough to just make sure that
> any data structures used for locking have types that are appropriate
> for that locking.

So something like the following for the qrwlock definition, with
appropriate C-preprocessor wrappers for the atomic-add accesses?

Thanx, Paul

------------------------------------------------------------------------

typedef struct qrwlock {
union qrwcnts {
#ifdef CONFIG_64B
struct (
int writer;
int reader;
};
atomic_long_t rwa;
u64 rwc;
#else
struct {
#ifdef __LITTLE_ENDIAN
u8 writer; /* Writer state */
#else
u16 r16; /* Reader count - msb */
u8 r8; /* Reader count - lsb */
u8 writer; /* Writer state */
#endif
};
atomic_t rwa; /* Reader/writer atomic */
u32 rwc; /* Reader/writer counts */
} cnts;
#endif
struct mcs_spinlock *waitq; /* Tail of waiting queue */
} arch_rwlock_t;

Waiman Long

unread,

Jan 21, 2014, 10:10:02 AM1/21/14

to

On 01/19/2014 03:04 AM, Paul E. McKenney wrote:
> On Sat, Jan 18, 2014 at 04:57:05PM -0800, Linus Torvalds wrote:
>> On Sat, Jan 18, 2014 at 1:22 PM, Paul E. McKenney
>> <pau...@linux.vnet.ibm.com> wrote:
>>> Yes, this requires that -all- updates to the fields in the machine word
>>> in question use atomic rmw. Which would not be pretty from a core-code
>>> perspective. Hence my suggestion of ceasing Linux-kernel support for
>>> DEC Alpha CPUs that don't support byte operations. Also need 16-bit
>>> operations as well, of course...
>> I'm not seeing this.
>>
>> Why the hell would you have byte- or halfword-sized versions of the
>> store_release or load_acquire things on alpha anyway?
>>
>> What it means is that data structures that do locking or atomics need
>> to be "int" or "long" on alpha. That has always been true. What do
>> you claim has changed?
> OK, another approach would be to never add "select ARCH_USE_QUEUE_RWLOCK"
> on Alpha, at least if the queued rwlocks really do want to atomically
> manipulate bytes. After all, the Alpha systems that I know about don't
> have enough CPUs to make queued rwlocks necessary anyway.
>
> Much simpler solution!
>
> Is this what you were getting at, or am I missing your point?
>
> Thanx, Paul
>

My latest v9 series of qrwlock patch will automatically adapt to the
lack of atomic byte access by using an atomic integer instruction
instead. So the new series should work for pre-EV56 Alpha, it is just a
bit less efficient in this case.

-Longman

Peter Zijlstra

unread,

Jan 21, 2014, 10:50:01 AM1/21/14

to

On Tue, Jan 21, 2014 at 10:02:06AM -0500, Waiman Long wrote:
> My latest v9 series of qrwlock patch will automatically adapt to the lack of
> atomic byte access by using an atomic integer instruction instead. So the
> new series should work for pre-EV56 Alpha, it is just a bit less efficient
> in this case.

See my other email; I don't think you can do that without also changing
the implementation of the queue_read_{try}lock() functions.

Without those changes you can have transient values in your 'read-count'
part of the word and a full word write will wreck things.

Waiman Long

unread,

Jan 21, 2014, 11:20:02 AM1/21/14

to

On 01/21/2014 10:41 AM, Peter Zijlstra wrote:
> On Tue, Jan 21, 2014 at 10:02:06AM -0500, Waiman Long wrote:
>> My latest v9 series of qrwlock patch will automatically adapt to the lack of
>> atomic byte access by using an atomic integer instruction instead. So the
>> new series should work for pre-EV56 Alpha, it is just a bit less efficient
>> in this case.
> See my other email; I don't think you can do that without also changing
> the implementation of the queue_read_{try}lock() functions.
>
> Without those changes you can have transient values in your 'read-count'
> part of the word and a full word write will wreck things.

I don't see any problem with my current logic. If a writer has the write
lock, the writer byte has to have a value of 0xff. So atomically
subtracting 0xff from it will guarantee that the writer byte will become
zero, which is the same as assigning a zero value to that byte. The only
difference is that an atomic subtract instruction will need to be used
instead of a simple byte assignment.

Please let me know if there is any flaw in my logic.

-Longman

Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()

Daniel J Blueman

Paul E. McKenney

Waiman Long

Daniel J Blueman

Peter Zijlstra

Waiman Long

Matt Turner

Richard Henderson

Waiman Long

Peter Zijlstra

Paul E. McKenney

Linus Torvalds

Paul E. McKenney

Daniel J Blueman

Peter Zijlstra

Paul E. McKenney

Peter Zijlstra

Peter Zijlstra

Paul E. McKenney

Peter Zijlstra

Paul E. McKenney

Peter Zijlstra

Paul E. McKenney

Linus Torvalds

Paul E. McKenney

Linus Torvalds

Paul E. McKenney

Waiman Long

Peter Zijlstra

Waiman Long