I don't think we can do it.
The main thing we have to keep in mind is that not everyone is using
TLSDESC. In fact, clang doesn't even support -mtls-dialect=gnu2.
If everyone switches to TLSDESC, then I am OK with dropping
optimizations for the old model.
But even with TLSDESC we still need linker relaxations. The TLSDESC idea
solves some of the GD -> IE cost in the case where the .so is not
dlopened, but that is it. Note that AARCH64 that is TLSDESC only has
relaxations.
So I am strongly against removing either non TLSDESC support of support
for the relaxations.
Cheers,
Rafael
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
If everyone switches to TLSDESC, then I am OK with dropping
optimizations for the old model.
But even with TLSDESC we still need linker relaxations. The TLSDESC idea
solves some of the GD -> IE cost in the case where the .so is not
dlopened, but that is it. Note that AARCH64 that is TLSDESC only has
relaxations.
So I am strongly against removing either non TLSDESC support of support
for the relaxations.
>> So I am strongly against removing either non TLSDESC support of support
>> for the relaxations.
>>
>
> It's still pretty arguable. By default, compilers use General Dynamic model
> with -fpic, and Initial Exec without -fpic.
It is more complicated than that. You can get all 4 modes with clang
-------------------------------
__thread int bar = 42;
int *foo(void) { return &bar; }
-------------------------------
without -fPIC: local exec.
-------------------------------
extern __thread int bar;
int *foo(void) { return &bar; }
-------------------------------
without -fPIC: initial exec.
with -fPIC: general dynamic
-------------------------------
__attribute__((visibility("hidden"))) extern __thread int bar;
int *foo(void) { return &bar; }
-------------------------------
with -fPIC: local dynamic.
> lld doesn't do any relaxation
> if -shared is given. So, if you are creating a DSO, thread-local variables
> in the DSO are accessed using Global Dynamic model. No relaxations are
> involved.
There is not a lot of opportunities there. If one patches one access at
a time LD is as expensive as GD. The linker also doesn't know if the .so
will be used with dlopen or not, sot it cannot relax to IE. I guess a
linker could have that command line option for the second part.
Now that I spell that out, it is easy to see the TLSDESC big
advantage. It can optimize the case the static linker cannot.
> If you are creating an executable and if your executable is not
> position-independent, you're using Initial Exec model by default which is
> as fast as variables accessed through GOT. If you really want to use Local
> Exec model, you can pass -ftls-model=local-exec to compilers.
But then all the used variables have to be defined in the same
executable. You can't have even one from a shared library (think errno).
The nice thing about linker relaxations is that they are very user
friendly. The linker is the first point in the toolchaing where some
usefull fact is know, and it can optimize the result with no user
intervention.
> So I don't see a strong reason to do a complicated instruction rewriting in
> the linker. I feel more like we should do whatever it is instructed to do
> by command line options and input object files. You are for example free to
> pass the -fPIC option to create object files and still let the linker to
> create a non-PIC executable, even though these combinations doesn't make
> much sense and produces slightly inefficient binary. If you don't like it,
> you can fix the compiler options. Thread-local variables can be considered
> in the same way, no?
They are considered in the same way, we also relax got access :-)
The proposal is making the linker worse for our users to make our lifes
easier. I really don't think we should do it.
It is likelly that we can code the existing optimization in a simpler
way. Even if we cannot, I don't think we should remove them.
Linker relaxations are extremely convenient. We use the example you
gave (-fPIC .o in an executable) all the time in llvm. That way we build
only one .o that is used in lib/ and bin/.
Linker relaxations are also fundamental to how RISCV works.
Rui Ueyama <ru...@google.com> writes:
>> So I am strongly against removing either non TLSDESC support of support
>> for the relaxations.
>>
>
> It's still pretty arguable. By default, compilers use General Dynamic model
> with -fpic, and Initial Exec without -fpic.
It is more complicated than that. You can get all 4 modes with clang
-------------------------------
__thread int bar = 42;
int *foo(void) { return &bar; }
-------------------------------
without -fPIC: local exec.
-------------------------------
extern __thread int bar;
int *foo(void) { return &bar; }
-------------------------------
without -fPIC: initial exec.
with -fPIC: general dynamic
-------------------------------
__attribute__((visibility("hidden"))) extern __thread int bar;
int *foo(void) { return &bar; }
-------------------------------
with -fPIC: local dynamic.
> lld doesn't do any relaxation
> if -shared is given. So, if you are creating a DSO, thread-local variables
> in the DSO are accessed using Global Dynamic model. No relaxations are
> involved.
There is not a lot of opportunities there. If one patches one access at
a time LD is as expensive as GD. The linker also doesn't know if the .so
will be used with dlopen or not, sot it cannot relax to IE. I guess a
linker could have that command line option for the second part.
Now that I spell that out, it is easy to see the TLSDESC big
advantage. It can optimize the case the static linker cannot.
> If you are creating an executable and if your executable is not
> position-independent, you're using Initial Exec model by default which is
> as fast as variables accessed through GOT. If you really want to use Local
> Exec model, you can pass -ftls-model=local-exec to compilers.
But then all the used variables have to be defined in the same
executable. You can't have even one from a shared library (think errno).
The nice thing about linker relaxations is that they are very user
friendly. The linker is the first point in the toolchaing where some
usefull fact is know, and it can optimize the result with no user
intervention.
Not sure what the impact of this would be. Does this mean that some
TLS relocations will no longer be supported? Or is it that they just
won't be optimized. How about static binaries? Don't they rely on
the local exec model?
Doe this affect linking code generated by older compilers (say GCC
4.2.1) in any way?
I've skipped over the description and I have some difficulty sharing
this conlusion. I don't see how it makes any significant difference. I
also don't know if any system beyond glibc implements it.
Side note: position independent executables that are properly compiled
behave like non-position independent executables.
Side note 2: I strongly question the assertions about frequency of
dlopen vs direct linking from the TLSDESC paper. Quite a few hacks on
the dynamic linker side are a direct result of people wanting to dlopen
libGL from scripting languages like Python.
Joerg
For Arm and, I think Mips is similar, there isn't any TLS relaxation
of instructions as the TLS relocations act on data and not
instructions. There are some cases where dynamic relocations can be
omitted, for example the module-id of an executable is defined to be 1
so there is no need for the dynamic linker to fill this in. For static
linking the linker knows module and the offsets of all the TLS Symbols
so it can resolve all the dynamic relocations. I don't know off the
top of my head whether this would apply to other architectures,
although I think the general principles should hold the same. The last
time I looked relaxation was the technique used to support static
linking on non-Arm and Mips Targets.
I have a vague memory of the OpenGL folk being sensitive to TLS
performance, particularly as the library is often shared. I think that
TLS relaxation isn't going to show up in many traditional benchmarking
suites as much of the performance critical code is going to be in the
application, and are unlikely to have much TLS in them. I'm thinking
that it would need something like a real-world application that makes
heavy use of shared-libraries with TLS (games, web-browsers or perhaps
HPC?).
Given that getting convincing data either way about the impact of TLS
relaxation could be difficult we should err towards keeping it.
Peter
On 8 November 2017 at 09:05, Mark Kettenis via llvm-dev
>> If you are creating an executable and if your executable is not
>> > position-independent, you're using Initial Exec model by default which is
>> > as fast as variables accessed through GOT. If you really want to use
>> Local
>> > Exec model, you can pass -ftls-model=local-exec to compilers.
>>
>> But then all the used variables have to be defined in the same
>> executable. You can't have even one from a shared library (think errno).
>>
>
> Not really -- you can still use Local Exec per variable basis using the
> visibility attribute. I don't think that we can observe noticeable
> difference in performance between Initial Exec and Local Exec except an
> synthetic benchmark though.
There nothing that the linker can do that the compiler could not have
done in the first place. The point is that if to switch to lld and keep
performance users should not have to annotate all tls variables with
tls-model.
> The nice thing about linker relaxations is that they are very user
>> friendly. The linker is the first point in the toolchaing where some
>> usefull fact is know, and it can optimize the result with no user
>> intervention.
>
>
> I think I agree with this point. Automatic linker code relaxation is
> convenient and if it makes a difference, we should implement that. But I'd
> doubt if TLS relaxation is actually effective. George implemented them
> because there's a spec defining how to relax them, and I accepted the
> patches without thinking hard enough, but I didn't see a convincing
> benchmark result (or even a non-convincing one) that shows that these
> relaxations actually make real-world programs faster. Do you know of
> any? It is funny that even the creator of TLSDESC found that their
> optimization didn't actually makes NPTL faster as it is mentioned in the
> "Conclusion" section in http://www.fsfla.org/~lxoliva/
> writeups/TLS/RFC-TLSDESC-x86.txt.
>
> So I don't think I'm proposing we simplify code by degrading user's code.
> It feels more like we are making too much effort on something that doesn't
> produce any measurable difference in real life.
*PLEASE* let us keep it. It is bad enough that we are regressing
performance in the name of having code that you find nicer. It would be
really annoying to see us drop a working feature just to reduce our
code a bit.
The code is working, please let it be!
At the very least we should keep it until we are in a position to
actually measure it. As is this is just guesswork. We would need a
*much* bigger adoption before we could measure this.
> On Tue, Nov 07, 2017 at 06:27:37PM -0800, Rui Ueyama via llvm-dev wrote:
>> tl;dr: TLSDESC have solved most problems in formerly inefficient TLS access
>> models, so I think we can drop TLS relaxation support from lld.
>
> I've skipped over the description and I have some difficulty sharing
> this conlusion. I don't see how it makes any significant difference. I
> also don't know if any system beyond glibc implements it.
musl implements it.
Cheers,
Rafael
In the OpenGL case it is primary an effect of retrofitting thread-safety
into existing APIs. Just like some systems retrofit many of the
non-reentrant libc functions by using thread-local storage for the
buffers.
Joerg