Android ELF TLS and the Go Mobile g register

332 views
Skip to first unread message

rpri...@google.com

unread,
Nov 15, 2018, 8:46:36 PM11/15/18
to golang-dev
I'm working on adding support for ELF TLS to Android, and the Go runtime is causing some trouble with the way it uses TLS memory for Go's g register. On each Android architecture (x86-32,x86-64,arm32,arm64), it's using a pthread key that it assumes is located at a positive offset from the thread pointer (TP/tlsbase), but conforming to the arm{32,64} TLS ABIs generally requires moving Bionic's pthread keys before the thread pointer, which would break all Go Android apps. It would simplify Bionic if it used the same ARM TLS layout on x86 too, which would then break Go on x86.

Go's TLS usage is different on ARM versus x86. It appears that (most?) every Go-compiled function accesses the g register in its function prologue. For the 4 gomobile Android targets, the g register is allocated to:
  • arm32: r10
  • arm64: x28
  • x86-32: TP+0xf8
  • x86-64: TP+0x1d0
On arm{32,64}, Go stores the g register in a GPR and saves/restores it to TLS memory at the C/Go interface. On x86 targets, the g register is stored directly in TLS memory. I'd guess this ARM-vs-x86 difference is a result of ARM having more registers to spare. x86-64 and arm32 both have 16 GPRs, though, and I saw a suggestion to reserve a GPR for x86-64.

On ARM, the g register is saved/restored to a runtime.tls_g variable at the C/Go interface. runtime.tls_g is an STT_TLS symbol on OSes that support it (e.g. Linux) and an ordinary variable elsewhere (e.g. Android/Darwin). When runtime.tls_g is an ordinary variable, the cgo inittls function initializes it to a TP offset by:
  1. creating a pthread key
  2. setting the key to a magic value
  3. searching for the key's address by scanning forward from the TP

On x86-{32,64}, the g register appears to be allocated directly to a runtime.tlsg STT_TLS symbol (not runtime.tls_g) and accessed from each prologue using either a local-exec (LE) or an initial-exec (IE) instruction sequence. LE is a bit more efficient and is used in Linux executables. Go uses IE in Linux solibs, but that's only guaranteed to work if the solib is part of the initial set of loaded modules. If the solib is loaded dynamically, then the TLS memory is taken from a surplus of static TLS memory. glibc reserves 1-2 kilobytes, musl reserves none, and my current Bionic prototype also reserves none.

On Android/x86, Go produces an solib for an app, which is always loaded dynamically. Go can't use IE relocations (R_TLS_TPOFF) because Bionic's loader doesn't support them yet, and Bionic might not have surplus static TLS memory anyway. Instead, Go uses an LE access, and the solibs have a hard-coded access of either TP+0xf8 or TP+0x1d0 in every function prologue. There is a cgo inittls function that tries to reserve a pthread key matching the fixed offset.

FWIW: In Android L, a newly created pthread key was guaranteed to have a value of zero. Starting in Android M, however, pthread keys use a lock-free system where a new key is lazily zero-initialized. If a key is recycled before Go is initialized (unlikely?) then g won't be zero-initialized, and maybe that breaks something? Each key (pthread_key_data_t) is a sequence number followed by the key data; I suspect the x86 inittls only works on M+ when BIONIC_TLS_SLOTS is an odd number.

It'd be nice if we could find a more robust way to run Go on Android. If anyone has ideas, that'd be great.

For ARM, I can imagine adding an API to Bionic that reserves a word of static TLS memory (memory at a fixed TP offset):
  • int pthread_alloc_tls_word_np(intptr_t* offset), or maybe
  • void* pthread_alloc_tls_word_np()

In principle, Go could use an API like this on x86-{32,64} too, but it would need to switch away from LE accesses. It would probably need to use an ordinary load of an ordinary variable (i.e. each prologue would access runtime.tls_g). It would be slower than LE, but probably no worse than running an ordinary solib on (non-Android) Linux.

If Bionic did support surplus static TLS memory, then I think Go could opt into it on new Android versions, using dlsym() to find its TLS storage rather than the above API. It could only use TLS_TPOFF relocations at the cost of dropping support for existing versions of Android, though.

Other things that might help:
  • Switch x86-64 to arm32's design (reserve a fixed register)
  • Drop support for Android/x86-32 or (if Bionic has surplus static TLS) limit x86-32 support to newer Android versions

If it helps, I wrote a document describing Android ELF TLS and some issues it has with TLS memory layout (including a potential workaround for this Go issue):

Any thoughts / corrections?

elias...@gmail.com

unread,
Nov 16, 2018, 9:16:18 AM11/16/18
to golang-dev
Thank you for bringing this up in time for a well though out fix and for including the unusual Go runtime requirements in your design considerations.

It seems there are multiple options on both the Bionic and Go side to solve this issue, and I think others are more qualified to choose between them. However, I think it is reasonable to break existing Gomobile programs, provided that

 - A fix to the Go runtime can made such that Go can run on both old and new versions of Android.
 - That fix is simple enough, or targeted enough, to be included in Go 1.12 slated for a February release.
 - The breaking version of Android is months away from Go 1.12.

If so, I think we can warn gomobile users on golang-nuts and in the release notes to rebuild their gomobile apps in time for the next Android.

Furthermore, I think it's reasonable to drop android/386 provided there are no practical use for the architecture outside the emulators. If so, I don't see a problem in requiring 64-bit emulator images for testing gomobile apps.

 - elias

rpri...@google.com

unread,
Jan 14, 2019, 4:38:09 PM1/14/19
to golang-dev
FWIW: I filed an issue in the Go bug tracker: https://github.com/golang/go/issues/29674.
Reply all
Reply to author
Forward
0 new messages