Problem with atomic accesses in pstore on some ARM CPUs

Guenter Roeck

unread,

Aug 15, 2016, 7:20:04 PM8/15/16

to

Hi,

we are having a problem with atomic accesses in pstore on some ARM
CPUs (specifically rk3288 and rk3399). With those chips, atomic
accesses fail with both pgprot_noncached and pgprot_writecombine
memory. Atomic accesses do work when selecting PAGE_KERNEL protection.

Debugging on rk3399 shows the following crash.

[ 0.912669] Bad mode in Error handler detected, code 0xbf000002 -- SError
[ 0.920140] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 4.4.14 #389
[ 0.926838] Hardware name: Google Kevin (DT)
[ 0.931533] task: ffffffc0edfe0000 ti: ffffffc0edf7c000 task.ti:
ffffffc0edf 7c000
[ 0.939780] PC is at __ll_sc___cmpxchg_case_mb_4+0x2c/0x5c
[ 0.945811] LR is at 0x1

The "solution" for this problem in various Chrome OS releases is to
disable atomic accesses in pstore entirely, which seems to be a bit
brute-force. Question is what a proper upstream-acceptable solution
might be. Introduce another memory type to select PAGE_KERNEL ? Is
there some means to determine if atomic operations are supported with
a given protection mask, maybe ? Anything else ?

Thanks,
Guenter

Robin Murphy

unread,

Aug 16, 2016, 6:40:05 AM8/16/16

to

Hi Guenter,

On 16/08/16 00:19, Guenter Roeck wrote:
> Hi,
>
> we are having a problem with atomic accesses in pstore on some ARM
> CPUs (specifically rk3288 and rk3399). With those chips, atomic
> accesses fail with both pgprot_noncached and pgprot_writecombine
> memory. Atomic accesses do work when selecting PAGE_KERNEL protection.

What's the pstore backed by? I'm guessing it's not normal DRAM.

> Debugging on rk3399 shows the following crash.
>
> [ 0.912669] Bad mode in Error handler detected, code 0xbf000002 -- SError
> [ 0.920140] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 4.4.14 #389
> [ 0.926838] Hardware name: Google Kevin (DT)
> [ 0.931533] task: ffffffc0edfe0000 ti: ffffffc0edf7c000 task.ti:
> ffffffc0edf 7c000
> [ 0.939780] PC is at __ll_sc___cmpxchg_case_mb_4+0x2c/0x5c
> [ 0.945811] LR is at 0x1
>
> The "solution" for this problem in various Chrome OS releases is to
> disable atomic accesses in pstore entirely, which seems to be a bit
> brute-force. Question is what a proper upstream-acceptable solution

From that, it sounds like the endpoint doesn't support exclusive
accesses. I imagine you get away with it through a PAGE_KERNEL mapping
by virtue of being write-back cacheable, such that the exclusives end up
being handled by the local monitor at L1 and don't go out to memory, but
I'm not sure even that's necessarily reliable.

In general terms, not doing atomic accesses is probably the only 100%
safe solution.

Robin.

> might be. Introduce another memory type to select PAGE_KERNEL ? Is
> there some means to determine if atomic operations are supported with
> a given protection mask, maybe ? Anything else ?
>
> Thanks,
> Guenter
>

> _______________________________________________
> linux-arm-kernel mailing list
> linux-ar...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>

Will Deacon

unread,

Aug 16, 2016, 6:50:07 AM8/16/16

to

On Tue, Aug 16, 2016 at 11:32:04AM +0100, Robin Murphy wrote:
> On 16/08/16 00:19, Guenter Roeck wrote:
> > we are having a problem with atomic accesses in pstore on some ARM
> > CPUs (specifically rk3288 and rk3399). With those chips, atomic
> > accesses fail with both pgprot_noncached and pgprot_writecombine
> > memory. Atomic accesses do work when selecting PAGE_KERNEL protection.
>
> What's the pstore backed by? I'm guessing it's not normal DRAM.

Regardless, pgprot_noncached and pgprot_writecombine map to Device-nGnRnE
and Normal-non-cacheable respectively, and so it's IMP DEP whether or not
exclusives will work there.

> > Debugging on rk3399 shows the following crash.
> >
> > [ 0.912669] Bad mode in Error handler detected, code 0xbf000002 -- SError
> > [ 0.920140] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 4.4.14 #389
> > [ 0.926838] Hardware name: Google Kevin (DT)
> > [ 0.931533] task: ffffffc0edfe0000 ti: ffffffc0edf7c000 task.ti:
> > ffffffc0edf 7c000
> > [ 0.939780] PC is at __ll_sc___cmpxchg_case_mb_4+0x2c/0x5c
> > [ 0.945811] LR is at 0x1
> >
> > The "solution" for this problem in various Chrome OS releases is to
> > disable atomic accesses in pstore entirely, which seems to be a bit
> > brute-force. Question is what a proper upstream-acceptable solution

Why do you require atomics to the pstore? If you need to serialise updates
from coherent observers (e.g. CPUs), is it acceptable to use a lock in
normal memory instead?

Will

Guenter Roeck

unread,

Aug 16, 2016, 9:20:08 AM8/16/16

to

On Tue, Aug 16, 2016 at 3:32 AM, Robin Murphy <robin....@arm.com> wrote:
> Hi Guenter,
>
> On 16/08/16 00:19, Guenter Roeck wrote:
>> Hi,
>>
>> we are having a problem with atomic accesses in pstore on some ARM
>> CPUs (specifically rk3288 and rk3399). With those chips, atomic
>> accesses fail with both pgprot_noncached and pgprot_writecombine
>> memory. Atomic accesses do work when selecting PAGE_KERNEL protection.
>
> What's the pstore backed by? I'm guessing it's not normal DRAM.
>

it is normal DRAM.

>> Debugging on rk3399 shows the following crash.
>>
>> [ 0.912669] Bad mode in Error handler detected, code 0xbf000002 -- SError
>> [ 0.920140] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 4.4.14 #389
>> [ 0.926838] Hardware name: Google Kevin (DT)
>> [ 0.931533] task: ffffffc0edfe0000 ti: ffffffc0edf7c000 task.ti:
>> ffffffc0edf 7c000
>> [ 0.939780] PC is at __ll_sc___cmpxchg_case_mb_4+0x2c/0x5c
>> [ 0.945811] LR is at 0x1
>>
>> The "solution" for this problem in various Chrome OS releases is to
>> disable atomic accesses in pstore entirely, which seems to be a bit
>> brute-force. Question is what a proper upstream-acceptable solution
>
> From that, it sounds like the endpoint doesn't support exclusive
> accesses. I imagine you get away with it through a PAGE_KERNEL mapping
> by virtue of being write-back cacheable, such that the exclusives end up
> being handled by the local monitor at L1 and don't go out to memory, but
> I'm not sure even that's necessarily reliable.
>
> In general terms, not doing atomic accesses is probably the only 100%
> safe solution.
>

Not sure I understand. Atomics work for the rest of the DRAM. Can't I
just use the same access protection ?

Thanks,
Guenter

Will Deacon

unread,

Aug 16, 2016, 9:30:05 AM8/16/16

to

On Tue, Aug 16, 2016 at 06:14:53AM -0700, Guenter Roeck wrote:
> On Tue, Aug 16, 2016 at 3:32 AM, Robin Murphy <robin....@arm.com> wrote:
> > On 16/08/16 00:19, Guenter Roeck wrote:
> >> we are having a problem with atomic accesses in pstore on some ARM
> >> CPUs (specifically rk3288 and rk3399). With those chips, atomic
> >> accesses fail with both pgprot_noncached and pgprot_writecombine
> >> memory. Atomic accesses do work when selecting PAGE_KERNEL protection.
> >
> > What's the pstore backed by? I'm guessing it's not normal DRAM.
> >
>
> it is normal DRAM.

In which case, why does it need to be mapped with weird attributes?
Is there an alias in the linear map you can use?

Will

Guenter Roeck

unread,

Aug 16, 2016, 9:30:06 AM8/16/16

to

Sure, this is just how pstore works today, and I would prefer not
having to hack upstream code. This is normal DRAM. The initialization
code does the following.

if (pfn_valid(start >> PAGE_SHIFT))
prz->vaddr = persistent_ram_vmap(start, size, memtype);
else
prz->vaddr = persistent_ram_iomap(start, size, memtype);

persistent_ram_vmap() uses atomics, persistent_ram_iomap() uses
spinlocks. For normal DRAM, pfn_valid() returns true.

Thanks,
Guenter

Guenter Roeck

unread,

Aug 16, 2016, 11:20:06 AM8/16/16

to

I don't really _want_ to do anything besides using pstore as-is, or,
in other words, to have the upstream kernel work with the affected
systems.

The current pstore code runs the following code for memory with
pfn_valid() = true.

if (memtype)
prot = pgprot_noncached(PAGE_KERNEL);
else
prot = pgprot_writecombine(PAGE_KERNEL);
...
vaddr = vmap(pages, page_count, VM_MAP, prot);

It then uses the memory pointed to by vaddr for atomic operations.

In my case, both protection options don't work. Everything works fine
(or at least doesn't create an exception) if I use
vaddr = vmap(pages, page_count, VM_MAP, PAGE_KERNEL);
instead.

Thanks,
Guenter

Mark Rutland

unread,

Aug 16, 2016, 1:30:05 PM8/16/16

to

This means that the generic ramoops / pstore code is making non-portable
assumptions about memory types.

So _something_ has to happen to that code.

> In my case, both protection options don't work. Everything works fine
> (or at least doesn't create an exception) if I use
> vaddr = vmap(pages, page_count, VM_MAP, PAGE_KERNEL);
> instead.

Architecturally, that will give you a memory type to which we can safely use
atomics on.

It would be nice to know why the ramoops/pstore code must use atomics, and
exactly what it's trying to achieve (i.e. is this just for serialisation, or an
attempt to ensure persistence).

Depending on that, it may be possible to fix things more generically by using
memremap by default, for example, and only allowing uncached mappings on those
architectures which support them.

Thanks,
Mark.

Colin Cross

unread,

Aug 16, 2016, 1:50:06 PM8/16/16

to

persistent_ram uses atomic ops in uncached memory to store the start
and end positions in the ringbuffer so that the state of the
ringbuffer will be valid if the kernel crashes at any time. This was
inherited from Android's ram_console implementation, and worked
through armv7. It has been causing more and more problems recently,
see for example 027bc8b08242c59e19356b4b2c189f2d849ab660 (pstore-ram:
Allow optional mapping with pgprot_noncached) and
7ae9cb81933515dc7db1aa3c47ef7653717e3090 (pstore-ram: Fix hangs by
using write-combine mappings).

Maybe it should be replaced with a spinlock in normal ram protecting
writes to the uncached region.

Guenter Roeck

unread,

Aug 16, 2016, 4:30:12 PM8/16/16

to

The necessary functions already exist, and are used for memory mapped
with ioremap() / ioremap_wc(). They were introduced with commit
0405a5cec3 ("pstore/ram: avoid atomic accesses for ioremapped
regions"), and the description in that patch sounds quite similar to
the current problem. Given that, would it be acceptable to remove
buffer_start_add_atomic() and buffer_size_add_atomic(), and always use
buffer_start_add_locked() and buffer_size_add_locked() instead ? Those
functions still use atomic_set() and atomic_read(), which works fine
in my tests. The only difference is that a spinlock in main memory is
used instead of atomic_cmpxchg().

Thanks,
Guenter

Kees Cook

unread,

Aug 16, 2016, 5:00:05 PM8/16/16

to

I don't see much of a down side to this. ramoops isn't expected to be
high-bandwidth so trading for a single global lock doesn't really
bother me.

(I've added other folks to CC that have touched this more recently...)

-Kees

--
Kees Cook
Nexus Security

Guenter Roeck

unread,

Aug 16, 2016, 8:30:05 PM8/16/16

to

On Tue, Aug 16, 2016 at 1:50 PM, Kees Cook <kees...@chromium.org> wrote:

[ ... ]

>>> persistent_ram uses atomic ops in uncached memory to store the start
>>> and end positions in the ringbuffer so that the state of the
>>> ringbuffer will be valid if the kernel crashes at any time. This was
>>> inherited from Android's ram_console implementation, and worked
>>> through armv7. It has been causing more and more problems recently,
>>> see for example 027bc8b08242c59e19356b4b2c189f2d849ab660 (pstore-ram:
>>> Allow optional mapping with pgprot_noncached) and
>>> 7ae9cb81933515dc7db1aa3c47ef7653717e3090 (pstore-ram: Fix hangs by
>>> using write-combine mappings).
>>>
>>> Maybe it should be replaced with a spinlock in normal ram protecting
>>> writes to the uncached region.
>>
>> The necessary functions already exist, and are used for memory mapped
>> with ioremap() / ioremap_wc(). They were introduced with commit
>> 0405a5cec3 ("pstore/ram: avoid atomic accesses for ioremapped
>> regions"), and the description in that patch sounds quite similar to
>> the current problem. Given that, would it be acceptable to remove
>> buffer_start_add_atomic() and buffer_size_add_atomic(), and always use
>> buffer_start_add_locked() and buffer_size_add_locked() instead ? Those
>> functions still use atomic_set() and atomic_read(), which works fine
>> in my tests. The only difference is that a spinlock in main memory is
>> used instead of atomic_cmpxchg().
>
> I don't see much of a down side to this. ramoops isn't expected to be
> high-bandwidth so trading for a single global lock doesn't really
> bother me.
>

Sounds good. I'll submit a patch to address the problem as suggested above.

Guenter

Russell King - ARM Linux

unread,

Aug 19, 2016, 5:40:05 AM8/19/16

to

On Tue, Aug 16, 2016 at 10:35:52AM -0700, Colin Cross wrote:
> persistent_ram uses atomic ops in uncached memory to store the start
> and end positions in the ringbuffer so that the state of the
> ringbuffer will be valid if the kernel crashes at any time. This was
> inherited from Android's ram_console implementation, and worked
> through armv7.

That statement is actually inaccurate. It may have worked on _some_
ARMv7 implementations, but it's not architecturally compliant.

The exclusive access instructions are not portable to anything but
"normal memory":

It is IMPLEMENTATION DEFINED whether LDREX and STREX operations can
be performed to a memory region with the Device or Strongly-ordered
memory attribute. Unless the implementation documentation explicitly
states that LDREX and STREX operations to a memory region with the
Device or Strongly-ordered attribute are permitted, the effect of
such operations is UNPREDICTABLE.

pgprot_noncached() gives strongly-ordered memory, and so is unsuitable
to place semaphores in for all ARMv7 implementations.

Also:

+ if (memtype)
+ va = ioremap(start, size);

this returns *device* memory, which is also unsuitable for all ARMv7
implementations.

It seems that pstore is playing in areas of the architecture which are
implementation defined, so it's no surprise that folk are seeing
different behaviours with different implementations.

The code isn't architecturally wrong, it just isn't portable to all ARMv7
implementations.

So, saying that it works on some ARMv7 implementations is irrelevent.

Note that LDREX and STREX are used for all operations that require
atomicity - iow, atomics and locks.

pgprot_writecombine() and ioremap_wc() will return memory which is
suitable for these exclusive accesses - it maps to the architectures'
"normal memory, uncacheable".

So, I suspect the OP should not be using mem_type=1 or using the
"unbuffered" DT attribute, but should leave it was the default
(mem_type=0).

--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Guenter Roeck

unread,

Aug 19, 2016, 8:50:10 AM8/19/16

to

Unfortunately, pgprot_writecombine() doesn't work in my case (for
rk3288 and rk3399). It was also reported not to work on Tegra Logan by
Nvidia. As mentioned earlier, PAGE_KERNEL works, at least for rk3288
and rk3399.

Guenter

Arnd Bergmann

unread,

Aug 22, 2016, 5:10:05 PM8/22/16

to

If we have both a cacheable and a noncacheable mapping for the
same DRAM area, things get even worse across many architectures.

IIRC PowerPC will trigger a checkstop if it encounters a valid
cache line for a noncacheable mapping.

If there is only one mapping, this is not a problem, but we probably
want to avoid having a write-back cache here, in case something
serious goes wrong and all the cache is invalidated but the pstore
is used for post-mortem analysis.

Arnd