Support for the race detector on ARM

702 views
Skip to first unread message

Owen Waller

unread,
Feb 23, 2017, 12:47:56 PM2/23/17
to golang-nuts
Hi Everyone,

I am assuming that the race detector is not yet supported on ARM hardware. Having just tried to cross compile some code I am seeing:

go build: -race and -msan are only supported on linux/amd64, freebsd/amd64, darwin/amd64 and windows/amd64.

The 1.8 docs back this up as well.

Does anybody have an idea of when (or even if) support for the race detector on ARM will arrive? In my case specifically for ARM6/6K. If not should I open a Github issue to at least track this?

Ideally, I'd like support via cross compiling, but I can certainly live with it in the short term if I had to rebuild on ARM hardware.

Thanks

Owen

Ian Lance Taylor

unread,
Feb 23, 2017, 2:28:23 PM2/23/17
to go-...@kulawe.com, golang-nuts
Go's race detector is based on and uses Thread Sanitizer, which has
only been implemented for amd64. I'm not aware of any effort to
extend thread sanitizer to other processors. That would have been
done before there is any likelihood of Go supporting it.

Ian

Owen Waller

unread,
Feb 23, 2017, 4:35:00 PM2/23/17
to Ian Lance Taylor, golang-nuts
Hi Ian,
Go's race detector is based on and uses Thread Sanitizer, which has
only been implemented for amd64.  I'm not aware of any effort to
extend thread sanitizer to other processors.  That would have been
done before there is any likelihood of Go supporting it.

Ian

Thanks for the help. 

After a little digging, the best I can come up with is that there might be an AARCH64 port. That would cover ARM8 64-bit at least.
The reference I've found is on the thread sanitizer google group is, here:
Which leads to a discussion on llvm.org which suggests it's done( maybe...)

What would be the best way to confirm if the AArch64 support exists, as that seems to be a prerequisite for at least an ARM8 race detector?

I can't find any existing solution to the 32bit ARMs - v5, v6, v7. Thread Sanitizer seems to need a whole heap of memory so there is a very real risk out running out of RAM. But then the typical program sizes that these cores run are much smaller. If we were talking in terms of programs that only used 10s to 100's of Megs of RAM - you'd have a reasonably good chance of x5-10 those amounts being free. So maybe there would be a case to try and port it to 32 bit platforms, but I admit that is a much bigger question.

Owen

[1] This post may also be related as it discusses getting the thread sanitizer running on a AMD AArch64 board with a 42 bit VA space.
[2] There also seems to be PPC64 port too
If this is also true then it also opens up using the race detector on the PPC64 go ports.

Ian Lance Taylor

unread,
Feb 23, 2017, 8:28:59 PM2/23/17
to go-...@kulawe.com, Dmitry Vyukov, golang-nuts
[ +dvyukov ]

Dmitry Vyukov

unread,
Feb 24, 2017, 4:34:05 AM2/24/17
to Ian Lance Taylor, go-...@kulawe.com, golang-nuts
C++ ThreadSanitizer works on arm64. So making it work for Go should be
a modest amount of work.
ThreadSanitizer does not work on any 32-bit platforms. It assumes that
it can reserve huge continuous chunks of address space for various
things.
ThreadSanitizer is not dependend on underlying hardware memory model,
it check against abstract language memory model. So even if you
production code runs on arm32, you can well test it for races on
x86_64.

Owen Waller

unread,
Feb 24, 2017, 6:39:44 AM2/24/17
to Dmitry Vyukov, Ian Lance Taylor, golang-nuts
Hi Dimitry,

C++ ThreadSanitizer works on arm64. So making it work for Go should be
a modest amount of work.

Thanks for confirming this. Should I now Open a issue on Github so we can track adding this?
Also, does it exist for any other platforms? There are hints of PPC64. What about MIPS64? or any of the other platforms that Go supports (at least on linux) - SPARC64?

ThreadSanitizer does not work on any 32-bit platforms. It assumes that
it can reserve huge continuous chunks of address space for various
things.

I have no idea how the ThreadSanitizer works, but can you clarify what you mean by "huge"?
I'm coming at this from am embedded angle, So huge to me is a program that swallows 10's of Megs of RAM.

32bit support would potentially also allow support on Android - on current hardware.


ThreadSanitizer is not dependend on underlying hardware memory model,
it check against abstract language memory model. So even if you
production code runs on arm32, you can well test it for races on
x86_64.


Unfortunately I'm in a case where this isn't possible. I'm trying to make sure that an library I am writing is go routine safe. The library itself  writes directly to the i2c bus on a Raspberry Pi to control an IC. The code isn't ARM only (per say) but the hardware element means it is.

Owen

Dmitry Vyukov

unread,
Feb 24, 2017, 7:10:15 AM2/24/17
to go-...@kulawe.com, Ian Lance Taylor, golang-nuts
On Fri, Feb 24, 2017 at 2:39 PM, Owen Waller <go-...@kulawe.com> wrote:
> Hi Dimitry,
>
> C++ ThreadSanitizer works on arm64. So making it work for Go should be
> a modest amount of work.
>
>
> Thanks for confirming this. Should I now Open a issue on Github so we can
> track adding this?

If there is somebody who is willing to work on this, then yes. Otherwise, maybe.

> Also, does it exist for any other platforms? There are hints of PPC64. What
> about MIPS64? or any of the other platforms that Go supports (at least on
> linux) - SPARC64?

Supported platforms are enumerated in
llvm/projects/compiler-rt/lib/tsan/rtl/tsan_platform.h
I see x86_64, aarch64, mips64 and powerpc64


> ThreadSanitizer does not work on any 32-bit platforms. It assumes that
> it can reserve huge continuous chunks of address space for various
> things.
>
>
> I have no idea how the ThreadSanitizer works, but can you clarify what you
> mean by "huge"?
> I'm coming at this from am embedded angle, So huge to me is a program that
> swallows 10's of Megs of RAM.

Tsan reserves 4X for shadow memory (where X is amount of memory where
user data resides), 0.5X for another shadow memory, 1TB for heap and
128GB for thread info. You can see the details in
llvm/projects/compiler-rt/lib/tsan/rtl/tsan_platform.h.



> 32bit support would potentially also allow support on Android - on current
> hardware.

There are 64-bit androids already. I would rather test with tsan
there. That's what we do for kernel ASAN.


> ThreadSanitizer is not dependend on underlying hardware memory model,
> it check against abstract language memory model. So even if you
> production code runs on arm32, you can well test it for races on
> x86_64.
>
>
> Unfortunately I'm in a case where this isn't possible. I'm trying to make
> sure that an library I am writing is go routine safe. The library itself
> writes directly to the i2c bus on a Raspberry Pi to control an IC. The code
> isn't ARM only (per say) but the hardware element means it is.

Maybe it's possible to stub the i2c part? That would make testing much
simpler regardless of tsan.

Owen Waller

unread,
Feb 24, 2017, 1:24:44 PM2/24/17
to Dmitry Vyukov, Ian Lance Taylor, golang-nuts
Hi Dimitry,

If there is somebody who is willing to work on this, then yes. Otherwise, maybe.


I have created https://github.com/golang/go/issues/19273 in order to track this. 

ThreadSanitizer does not work on any 32-bit platforms. It assumes that it can reserve huge continuous chunks of address space for various things.

Tsan reserves 4X for shadow memory (where X is amount of memory where
user data resides), 0.5X for another shadow memory, 1TB for heap and
128GB for thread info. You can see the details in
llvm/projects/compiler-rt/lib/tsan/rtl/tsan_platform.h.


Those ranges are well beyond the 32-bit address space. But, having had a quick look at the tsan_platform.h file. It looks like the file contains memory layouts for hardware with 39, 42, 44, 46 and 64 bit address spaces, at least for C/C++ programs if not Go.

Given that these ranges are just defined as constants in the code is there a fundamental reason why these ranges could not be shrunk to fit into a 32 bit address space? Does the tread sanitizer require some (assembly) instructions that are only found in 64 bit hardware for example?

Maybe it's possible to stub the i2c part? That would make testing much
simpler regardless of tsan.


In this specific case, maybe, yes. It's not my i2c drier so just how much work and how much change would be required I'd need to investigate.

Owen

Dmitry Vyukov

unread,
Feb 24, 2017, 1:32:56 PM2/24/17
to go-...@kulawe.com, Ian Lance Taylor, golang-nuts
On Fri, Feb 24, 2017 at 9:24 PM, Owen Waller <go-...@kulawe.com> wrote:
> Hi Dimitry,
>
>
> If there is somebody who is willing to work on this, then yes. Otherwise,
> maybe.
>
>
> I have created https://github.com/golang/go/issues/19273 in order to track
> this.
>
> ThreadSanitizer does not work on any 32-bit platforms. It assumes that
> it can reserve huge continuous chunks of address space for various
> things.
>
>
>
>
> Tsan reserves 4X for shadow memory (where X is amount of memory where
> user data resides), 0.5X for another shadow memory, 1TB for heap and
> 128GB for thread info. You can see the details in
> llvm/projects/compiler-rt/lib/tsan/rtl/tsan_platform.h.
>
>
> Those ranges are well beyond the 32-bit address space. But, having had a
> quick look at the tsan_platform.h file. It looks like the file contains
> memory layouts for hardware with 39, 42, 44, 46 and 64 bit address spaces,
> at least for C/C++ programs if not Go.
>
> Given that these ranges are just defined as constants in the code is there a
> fundamental reason why these ranges could not be shrunk to fit into a 32 bit
> address space? Does the tread sanitizer require some (assembly) instructions
> that are only found in 64 bit hardware for example?

No fundamental reason.
Tsan requires 64-bit atomic loads and stores.

Owen Waller

unread,
Feb 24, 2017, 5:25:25 PM2/24/17
to Dmitry Vyukov, Ian Lance Taylor, golang-nuts
Hi Dimitry,

ThreadSanitizer does not work on any 32-bit platforms. It assumes that it can reserve huge continuous chunks of address space for various things. Tsan reserves 4X for shadow memory (where X is amount of memory where user data resides), 0.5X for another shadow memory, 1TB for heap and 128GB for thread info. You can see the details in llvm/projects/compiler-rt/lib/tsan/rtl/tsan_platform.h. Those ranges are well beyond the 32-bit address space. But, having had a quick look at the tsan_platform.h file. It looks like the file contains memory layouts for hardware with 39, 42, 44, 46 and 64 bit address spaces, at least for C/C++ programs if not Go. Given that these ranges are just defined as constants in the code is there a fundamental reason why these ranges could not be shrunk to fit into a 32 bit address space? Does the tread sanitizer require some (assembly) instructions that are only found in 64 bit hardware for example?
No fundamental reason. Tsan requires 64-bit atomic loads and stores.

This leads me to ask two things.

Why are 64-bit atomic load and stores required? To take an example. ARM6 cores have have load stores for a very long time[1]. But being a 32-bit core that's usually attached to a 32 bit memory bus, the instructions are 32-bit. So is it just that an atomic pair of load and store operations are required? I am of course assuming that gcc or clang on these platforms can make use of these instructions.

If all that is needed are atomic loads and stores then that leads to the second question. Why hasn't a 32-bit port (with a reduced memory map) of the thread sanitizer already appeared? There are lots of other 32 bit cores with atomic load sores - MIPS32, PPC, SPARC etc...

At the minute I feel I am missing something _very_ important. Otherwise this looks like it should be a solved problem. 


Owen

Dmitry Vyukov

unread,
Feb 27, 2017, 4:53:50 AM2/27/17
to go-...@kulawe.com, Ian Lance Taylor, golang-nuts
Yes, it needs just atomic loads and stores.

> If all that is needed are atomic loads and stores then that leads to the
> second question. Why hasn't a 32-bit port (with a reduced memory map) of the
> thread sanitizer already appeared? There are lots of other 32 bit cores with
> atomic load sores - MIPS32, PPC, SPARC etc...
>
> At the minute I feel I am missing something _very_ important. Otherwise this
> looks like it should be a solved problem.


A 32-bit version would be quite restrictive wrt the amount of memory
an app can use. Tsan can have up to 10x overhead, with Go's GC
overhead this goes to up to 20x.

But I think the real reason is that nobody was interested enough in it
to implement it.

There can also be some hidden problems that I don't see (we never
seriously considered porting tsan to 32-bits).

Owen Waller

unread,
Feb 27, 2017, 6:45:24 PM2/27/17
to Dmitry Vyukov, Ian Lance Taylor, golang-nuts
Why are 64-bit atomic load and stores required? To take an example. ARM6 cores have have load stores for a very long time[1]. But being a 32-bit core that's usually attached to a 32 bit memory bus, the instructions are 32-bit. So is it just that an atomic pair of load and store operations are required? I am of course assuming that gcc or clang on these platforms can make use of these instructions.
Yes, it needs just atomic loads and stores.

Good, so in theory almost any modern processor architecture can be supported.

If all that is needed are atomic loads and stores then that leads to the second question. Why hasn't a 32-bit port (with a reduced memory map) of the thread sanitizer already appeared? There are lots of other 32 bit cores with atomic load sores - MIPS32, PPC, SPARC etc... At the minute I feel I am missing something _very_ important. Otherwise this looks like it should be a solved problem.
A 32-bit version would be quite restrictive wrt the amount of memory an app can use. Tsan can have up to 10x overhead, with Go's GC overhead this goes to up to 20x. But I think the real reason is that nobody was interested enough in it to implement it. There can also be some hidden problems that I don't see (we never seriously considered porting tsan to 32-bits).

So next dumb question.

How did you work out the memory mappings for 64-bit systems and do you want to take a stab at what the mappings might be for a 32 bit system? I know the memory map will be smaller and it will restrict things but it might be enough for what I need (and possibly others working on small embedded hardware).

But this seems to be heading towards a conclusion of. "Somebody just needs to try it, and see what happens." 
So in that case:

a) What do I need to do to rebuild the lib? So I need clang or will gcc do?
b) Is the library difficult to rebuild? Is there anything special I need to know?
c) The memory maps are laid out using "unsigned long long!" types (as would be expected). I am assuming that in a 32 bit world I can just leave these alone- rather than convert them to unsigned long, and take the overhead or both size and performance?
d) Can you think of anything else I might have to change in the code?

Owen

Dmitry Vyukov

unread,
Feb 28, 2017, 2:30:50 AM2/28/17
to go-...@kulawe.com, Ian Lance Taylor, golang-nuts, thread-s...@googlegroups.com
The make goal of the current mapping is to make MemToShadow function
fast (no memory accesses, no branching).
For starters you can take any simple mapping at the cost making
MemToShadow slower.


> and do you want
> to take a stab at what the mappings might be for a 32 bit system?

I don't have time to work on this.

> I know the
> memory map will be smaller and it will restrict things but it might be
> enough for what I need (and possibly others working on small embedded
> hardware).
>
> But this seems to be heading towards a conclusion of. "Somebody just needs
> to try it, and see what happens."
> So in that case:
>
> a) What do I need to do to rebuild the lib? So I need clang or will gcc do?
> b) Is the library difficult to rebuild? Is there anything special I need to
> know?

You need clang as it contains the master copy of tsan runtime. Here
are some building instructions:
https://github.com/google/sanitizers/wiki/AddressSanitizerHowToBuild


> c) The memory maps are laid out using "unsigned long long!" types (as would
> be expected). I am assuming that in a 32 bit world I can just leave these
> alone- rather than convert them to unsigned long, and take the overhead or
> both size and performance?

Yes, they just need to be 64-bit wide.

> d) Can you think of anything else I might have to change in the code?

Nothing particular comes to mind...

Owen Waller

unread,
Feb 28, 2017, 6:01:46 PM2/28/17
to Dmitry Vyukov, Ian Lance Taylor, golang-nuts, thread-s...@googlegroups.com
Hi Dimitry,

The make goal of the current mapping is to make MemToShadow function
fast (no memory accesses, no branching).
For starters you can take any simple mapping at the cost making
MemToShadow slower.


OK that's good. As I wasn't intending to do anything apart from put in the new memory map, rebuild things and then try it against some sort of hello world example.

I do not want to be changing any code beyond this if a can avoid it.

and do you want to take a stab at what the mappings might be for a 32 bit system?
I don't have time to work on this.

I understand that. But at the minute, I don't understand how the numbers in the memory map are arrived at. Is the process documented somewhere so I can work this out for myself?

You need clang as it contains the master copy of tsan runtime. Here
are some building instructions:
https://github.com/google/sanitizers/wiki/AddressSanitizerHowToBuild


Thank you for this. It looks like I'll need to set up clang etc first.

Owen

Dmitry Vyukov

unread,
Mar 1, 2017, 3:43:57 AM3/1/17
to go-...@kulawe.com, Ian Lance Taylor, golang-nuts, thread-s...@googlegroups.com
On Wed, Mar 1, 2017 at 2:00 AM, Owen Waller <go-...@kulawe.com> wrote:
> Hi Dimitry,
>
> The make goal of the current mapping is to make MemToShadow function
> fast (no memory accesses, no branching).
> For starters you can take any simple mapping at the cost making
> MemToShadow slower.
>
>
> OK that's good. As I wasn't intending to do anything apart from put in the
> new memory map, rebuild things and then try it against some sort of hello
> world example.
>
> I do not want to be changing any code beyond this if a can avoid it.
>
>
> and do you want
> to take a stab at what the mappings might be for a 32 bit system?
>
>
> I don't have time to work on this.
>
>
> I understand that. But at the minute, I don't understand how the numbers in
> the memory map are arrived at. Is the process documented somewhere so I can
> work this out for myself?


The mapping needs to satisfy the following requirements:
1. For all user memory regions there are 4x shadow regions.
2. For all user memory regions there are 0.5x meta shadow regions.
3. There is a region for thread traces.
4. +maybe a region for internal heap.
Reply all
Reply to author
Forward
0 new messages