I'm working on adding support for ELF TLS to Android, and the Go runtime is causing some trouble with the way it uses TLS memory for Go's g register. On each Android architecture (x86-32,x86-64,arm32,arm64), it's using a pthread key that it assumes is located at a positive offset from the thread pointer (TP/tlsbase), but conforming to the arm{32,64} TLS ABIs generally requires moving Bionic's pthread keys before the thread pointer, which would break all Go Android apps. It would simplify Bionic if it used the same ARM TLS layout on x86 too, which would then break Go on x86.
Go's TLS usage is different on ARM versus x86. It appears that (most?) every Go-compiled function accesses the g register in its function prologue. For the 4 gomobile Android targets, the g register is allocated to:
On arm{32,64}, Go stores the g register in a GPR and saves/restores it to TLS memory at the C/Go interface. On x86 targets, the g register is stored directly in TLS memory. I'd guess this ARM-vs-x86 difference is a result of ARM having more registers to spare. x86-64 and arm32 both have 16 GPRs, though, and I saw a suggestion
to reserve a GPR for x86-64.
On ARM, the g register is saved/restored to a runtime.tls_g variable at the C/Go interface. runtime.tls_g is an STT_TLS symbol on OSes that support it (e.g. Linux) and an ordinary variable elsewhere (e.g. Android/Darwin). When runtime.tls_g is an ordinary variable, the cgo inittls function initializes it to a TP offset by:
- creating a pthread key
- setting the key to a magic value
- searching for the key's address by scanning forward from the TP
On x86-{32,64}, the g register appears to be allocated directly to a runtime.tlsg STT_TLS symbol (not runtime.tls_g) and accessed from each prologue using either a local-exec (LE) or an initial-exec (IE) instruction sequence. LE is a bit more efficient and is used in Linux executables. Go uses IE in Linux solibs, but that's only guaranteed to work if the solib is part of the initial set of loaded modules. If the solib is loaded dynamically, then the TLS memory is taken
from a surplus of static TLS memory. glibc reserves 1-2 kilobytes, musl reserves none, and my current Bionic prototype also reserves none.
On Android/x86, Go produces an solib for an app, which is always loaded dynamically. Go can't use IE relocations (R_TLS_TPOFF) because Bionic's loader doesn't support them yet, and Bionic might not have surplus static TLS memory anyway. Instead, Go uses an LE access, and the solibs have a hard-coded access of either TP+0xf8 or TP+0x1d0 in every function prologue. There is a cgo inittls function that tries to reserve a pthread key matching the fixed offset.
FWIW: In Android L, a newly created pthread key was guaranteed to have a value of zero.
Starting in Android M, however, pthread keys use a lock-free system where a new key is lazily zero-initialized. If a key is recycled before Go is initialized (unlikely?) then g won't be zero-initialized, and
maybe that breaks something? Each key (pthread_key_data_t) is a sequence number followed by the key data; I suspect the x86 inittls only works on M+ when BIONIC_TLS_SLOTS is an odd number.
It'd be nice if we could find a more robust way to run Go on Android. If anyone has ideas, that'd be great.
For ARM, I can imagine adding an API to Bionic that reserves a word of static TLS memory (memory at a fixed TP offset):
- int pthread_alloc_tls_word_np(intptr_t* offset), or maybe
- void* pthread_alloc_tls_word_np()
In principle, Go could use an API like this on x86-{32,64} too, but it would need to switch away from LE accesses. It would probably need to use an ordinary load of an ordinary variable (i.e. each prologue would access runtime.tls_g). It would be slower than LE, but probably no worse than running an ordinary solib on (non-Android) Linux.
If Bionic did support surplus static TLS memory, then I think Go could opt into it on new Android versions, using dlsym() to find its TLS storage rather than the above API. It could only use TLS_TPOFF relocations at the cost of dropping support for existing versions of Android, though.
Other things that might help:
- Switch x86-64 to arm32's design (reserve a fixed register)
- Drop support for Android/x86-32 or (if Bionic has surplus static TLS) limit x86-32 support to newer Android versions
If it helps, I wrote a document describing Android ELF TLS and some issues it has with TLS memory layout (including a potential workaround for this Go issue):
Any thoughts / corrections?