Re: glibc 2.19 - asyn-signal safe TLS and ASan.

182 views
Skip to first unread message

Kostya Serebryany

unread,
Jan 23, 2014, 10:35:03 AM1/23/14
to Carlos O'Donell, GNU C Library, Roland McGrath, Paul Pluzhnikov, Andrew Hunter, address-...@googlegroups.com
[plain text now]

Thanks again for reaching us.
The new TLS implementation *is* a problem for us as it turns out.
At least for LeakSanitizer (lsan) it will cause a false positive leak report.
Admittedly, the current lsan's implementation has an ugly hack around TLS,
which was the major reason for filing
https://sourceware.org/bugzilla/show_bug.cgi?id=16291
In short, we treat __libc_memalign called from elf/dl-tls.c in a special way
which allows us to include the dynamic TLS into the leak detector's
memory root set.
More:
http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/lsan/lsan_common_linux.cc?view=diff&r1=199899&r2=199900&pathrev=199900

I suggest we continue the discussion in
https://sourceware.org/bugzilla/show_bug.cgi?id=16291
unless you prefer otherwise. Let me post more details there.

I also suspect that the new TLS implementation may cause us trouble in
MemorySanitizer,
but I haven't tried to verify that yet.


--kcc



On Tue, Jan 21, 2014 at 4:27 AM, Carlos O'Donell <car...@redhat.com> wrote:
> Konstantin,
>
> I've forwarded your response to libc-alpha which I assume rejected
> your multi-part plain/html email.
>
> I've also corrected the small mistake that the next release is 2.19
> not 2.20. Sorry.
>
> +address-...@googlegroups.com
>
> Hi Carlos,
>
> Thanks for the heads up!
> I don't expect any impact on ASan from this change.
> We'd still test ASan with the new glibc to make sure.
>
> --kcc
>
>
>
> On Sat, Jan 11, 2014 at 7:39 AM, Carlos O'Donell <car...@redhat.com> wrote:
>
> Hello Konstantin!,
>
> You're getting this email because you're the only ASan expert I know,
> and I was at your talk at LFCS2013 ;-)
>
> We have a problem and we'd like your input if you have time.
>
> The GNU C Library version 2.20 (coming out at the end of the month)
> plans to stop using malloc for TLS allocations. The reason for this
> is that malloc is async-signal unsafe, and TLS accessed in signal
> handlers may need to allocate storage at the time of access. This
> is particularly true of signal handlers provided by dlopened shared
> libraries. There is no way to interpose yourself here because the
> non-malloc signal-safe allocator being used is internal to glibc.
>
> What kind of impact do you see this having on ASan?
>
> Do you see any way we can mitigate this impact?
>
> Cheers,
> Carlos.
>

Kostya Serebryany

unread,
Jan 24, 2014, 4:21:41 AM1/24/14
to Andrew Hunter, Carlos O'Donell, GNU C Library, Roland McGrath, Paul Pluzhnikov, address-...@googlegroups.com
On Fri, Jan 24, 2014 at 1:59 AM, Andrew Hunter <a...@google.com> wrote:
> FYI -- you could easily do the same thing with calls to
> signal_safe_memalign from libc as a stopgap. (Well, we'd need to
> export the symbols from libc, like I wanted to in the first place.)

Correct. If glibc lets us intercept __signal_safe_memalign&co we can
apply the same hack as we currently have.
It will not be a solution for
https://sourceware.org/bugzilla/show_bug.cgi?id=16291
but it will keep lsan and msan working with the new glibc.
How do I "export" __signal_safe_memalign? (I/d like to experiment myself)

(I've just verified that the new glibc indeed causes false positive
with MemorySanitizer,
although to be fare we have it with the old glibc too, which is another reason
why we need https://sourceware.org/bugzilla/show_bug.cgi?id=16291)

--kcc

Kostya Serebryany

unread,
Jan 24, 2014, 2:05:25 PM1/24/14
to Joseph S. Myers, Paul Pluzhnikov, Andrew Hunter, Carlos O'Donell, GNU C Library, Roland McGrath, address-...@googlegroups.com



On Fri, Jan 24, 2014 at 9:53 PM, Joseph S. Myers <jos...@codesourcery.com> wrote:
On Fri, 24 Jan 2014, Paul Pluzhnikov wrote:

> I *think* exporting these symbols for 2.19 is the right thing to do.

I don't think such symbols are a good idea for the public interface (as
Could you think of some other way to un-break LeakSanitizer in the 2.19 time frame?
(I suppose that https://sourceware.org/bugzilla/show_bug.cgi?id=16291 can not be properly implemented for 2.19).
I am sure we will find some even uglier hack to support both 2.19 and pre-2.19, but that will be most unfortunate... 

Thanks! 

--kcc 

opposed to GLIBC_PRIVATE) - and in any case, symbols should not be added
to the public interface during release freeze, unless only for an
architecture for which you are doing the release testing, because adding
public symbols means updating all the ABI baselines and retesting in all
relevant configurations.

Specifically, I think of the present solution as an interim solution,
until someone implements TLS in a way that does the allocation at
pthread_create / dlopen time and avoids the possibility of allocation
failure where an error cannot be returned.  And it's not obvious that such
non-lazy allocation would need signal-safe allocators at all - meaning
that glibc built for new architectures, not needing compatibility for any
old binaries relying on overcommit for TLS variables, could then avoid
including those allocators (presuming we keep them at all to make
allocation for existing binaries signal-safe).

Exporting symbols at GLIBC_PRIVATE is not a good solution for externally
maintained projects because those shouldn't be using GLIBC_PRIVATE symbols
at all.

--
Joseph S. Myers
jos...@codesourcery.com

Kostya Serebryany

unread,
Jan 24, 2014, 3:52:20 PM1/24/14
to Joseph S. Myers, Paul Pluzhnikov, Andrew Hunter, Carlos O'Donell, GNU C Library, Roland McGrath, address-...@googlegroups.com
[text-only]

Ondřej Bílka

unread,
Jan 24, 2014, 10:02:32 PM1/24/14
to Paul Pluzhnikov, Kostya Serebryany, Joseph S. Myers, Andrew Hunter, Carlos O'Donell, GNU C Library, Roland McGrath, address-...@googlegroups.com
On Fri, Jan 24, 2014 at 05:12:28PM -0800, Paul Pluzhnikov wrote:
> On Fri, Jan 24, 2014 at 5:05 PM, Ondřej Bílka <nel...@seznam.cz> wrote:
>
> > There would be a possible hack to override mmap and look for mmap that
> > with dl_addr in backtrace.
>
> That is unlikely to work:
>
> (gdb) disas __signal_safe_memalign
> Dump of assembler code for function __signal_safe_memalign:
> ...
> 0x00000000000103e2 <+114>: callq 0x18500 <mmap64>
> ...
>
> That is, the call to mmap64 does not go through PLT, and overriding it is
> just as difficult as overriding __signal_safe_memalign :-(
>
I did not considered this one. As mmap is quite slow there is no deep
reason for it.

We would need to make also mmap there go via plt by same logic.

Ondřej Bílka

unread,
Jan 24, 2014, 8:05:34 PM1/24/14
to Kostya Serebryany, Joseph S. Myers, Paul Pluzhnikov, Andrew Hunter, Carlos O'Donell, GNU C Library, Roland McGrath, address-...@googlegroups.com
On Sat, Jan 25, 2014 at 12:52:20AM +0400, Kostya Serebryany wrote:
> [text-only]
>
> On Fri, Jan 24, 2014 at 9:53 PM, Joseph S. Myers
> <jos...@codesourcery.com> wrote:
> > On Fri, 24 Jan 2014, Paul Pluzhnikov wrote:
> >
> >> I *think* exporting these symbols for 2.19 is the right thing to do.
> >
> > I don't think such symbols are a good idea for the public interface (as
> Could you think of some other way to un-break LeakSanitizer in the
> 2.19 time frame?
> (I suppose that https://sourceware.org/bugzilla/show_bug.cgi?id=16291
> can not be properly implemented for 2.19).
> I am sure we will find some even uglier hack to support both 2.19 and
> pre-2.19, but that will be most unfortunate...
>

Paul Pluzhnikov

unread,
Jan 24, 2014, 8:12:28 PM1/24/14
to Ondřej Bílka, Kostya Serebryany, Joseph S. Myers, Andrew Hunter, Carlos O'Donell, GNU C Library, Roland McGrath, address-...@googlegroups.com
On Fri, Jan 24, 2014 at 5:05 PM, Ondřej Bílka <nel...@seznam.cz> wrote:

> There would be a possible hack to override mmap and look for mmap that
> with dl_addr in backtrace.

That is unlikely to work:

(gdb) disas __signal_safe_memalign
Dump of assembler code for function __signal_safe_memalign:
...
0x00000000000103e2 <+114>: callq 0x18500 <mmap64>
...

That is, the call to mmap64 does not go through PLT, and overriding it is
just as difficult as overriding __signal_safe_memalign :-(


--
Paul Pluzhnikov

Kostya Serebryany

unread,
Jan 27, 2014, 8:37:31 AM1/27/14
to Ondřej Bílka, Paul Pluzhnikov, Joseph S. Myers, Andrew Hunter, Carlos O'Donell, GNU C Library, Roland McGrath, address-...@googlegroups.com
exporting mmap will probably have some other unpredictable
consequences and is even less likely to be acceptable at the "release
freeze" stage.

> > How do I "export" __signal_safe_memalign? (I/d like to experiment myself)
> You'll need this snippet of above patch: (line 61,5):

Paul, I tried this:

--- a/elf/Versions
+++ b/elf/Versions
@@ -62,5 +62,8 @@ ld {

# Pointer protection.
__pointer_chk_guard;
+ # for signal safe TLS
+ __signal_safe_malloc; __signal_safe_free; __signal_safe_memalign;
+ __signal_safe_realloc; __signal_safe_calloc;
}
}

it did not help: "nm libc.so | grep signal_safe_memalign" is empty.

--kcc

Paul Pluzhnikov

unread,
Jan 27, 2014, 10:37:23 AM1/27/14
to Kostya Serebryany, Ondřej Bílka, Joseph S. Myers, Andrew Hunter, Carlos O'Donell, GNU C Library, Roland McGrath, address-...@googlegroups.com
On Mon, Jan 27, 2014 at 5:37 AM, Kostya Serebryany <k...@google.com> wrote:

>> > How do I "export" __signal_safe_memalign? (I/d like to experiment myself)
>> You'll need this snippet of above patch: (line 61,5):
>
> Paul, I tried this:
>
> --- a/elf/Versions
> +++ b/elf/Versions
> @@ -62,5 +62,8 @@ ld {
>
> # Pointer protection.
> __pointer_chk_guard;
> + # for signal safe TLS
> + __signal_safe_malloc; __signal_safe_free; __signal_safe_memalign;
> + __signal_safe_realloc; __signal_safe_calloc;
> }
> }
>
> it did not help: "nm libc.so | grep signal_safe_memalign" is empty.

It probably did. I believe these functions are defined and used in
'ld.so', so try 'nm ld.so | grep signal_safe'.


--
Paul Pluzhnikov

Kostya Serebryany

unread,
Jan 27, 2014, 10:47:31 AM1/27/14
to Paul Pluzhnikov, Ondřej Bílka, Joseph S. Myers, Andrew Hunter, Carlos O'Donell, GNU C Library, Roland McGrath, address-...@googlegroups.com
Indeed so, thanks!
So, exporting __signal_safe_memalign&co will allow us to extend the
existing hack to 2.19.
If this simple change can not be done for 2.19, can *anything* be done at all?
(Long term we'd still prefer something less hackish)

--kcc

>
>
> --
> Paul Pluzhnikov

Kostya Serebryany

unread,
Jan 29, 2014, 4:46:39 AM1/29/14
to Paul Pluzhnikov, Ondřej Bílka, Joseph S. Myers, Andrew Hunter, Carlos O'Donell, GNU C Library, Roland McGrath, address-...@googlegroups.com
FTR, I've implemented an even-uglier-then-before hack that deals with dynamic TLS in both <=2.18 and 2.19.
So, we will survive the 2.19 release. 
But I would appreciate if we can resolve https://sourceware.org/bugzilla/show_bug.cgi?id=16291
before the next one (2.20).

Kostya Serebryany

unread,
Jan 29, 2014, 4:48:31 AM1/29/14
to Paul Pluzhnikov, Ondřej Bílka, Joseph S. Myers, Andrew Hunter, Carlos O'Donell, GNU C Library, Roland McGrath, address-...@googlegroups.com
[text only]

> Indeed so, thanks!
> So, exporting __signal_safe_memalign&co will allow us to extend the
> existing hack to 2.19.
> If this simple change can not be done for 2.19, can *anything* be done at all?
> (Long term we'd still prefer something less hackish)

Kostya Serebryany

unread,
Jan 30, 2014, 12:33:15 AM1/30/14
to Carlos O'Donell, Paul Pluzhnikov, Ondřej Bílka, Joseph S. Myers, Andrew Hunter, GNU C Library, Roland McGrath, address-...@googlegroups.com
On Thu, Jan 30, 2014 at 9:23 AM, Carlos O'Donell <car...@redhat.com> wrote:
> Can you please describe the hack?

intercept __tls_get_addr and __libc_memalign.
if __libc_memalign is called while we are inside __tls_get_addr, we
know we are in <= 2.18 mode and we know what to do.
if __libc_memalign was not called but the DSO ID passed to
__tls_get_addr was not seen before by the current thread,
we know that we are in 2.19 mode and that the TLS block was allocated
by __signal_safe_memalign, which
has a header with the actual block size. Ugly, as I said.

--kcc



>
> Cheers,
> Carlos.
>

Carlos O'Donell

unread,
Jan 30, 2014, 12:23:51 AM1/30/14
to Kostya Serebryany, Paul Pluzhnikov, Ondřej Bílka, Joseph S. Myers, Andrew Hunter, GNU C Library, Roland McGrath, address-...@googlegroups.com
On 01/29/2014 04:48 AM, Kostya Serebryany wrote:
Can you please describe the hack?

Cheers,
Carlos.

Carlos O'Donell

unread,
Jan 30, 2014, 12:49:03 AM1/30/14
to Kostya Serebryany, Paul Pluzhnikov, Ondřej Bílka, Joseph S. Myers, Andrew Hunter, GNU C Library, Roland McGrath, address-...@googlegroups.com
Not terrible at all, quite elegant actually.

Thanks.

Cheers,
Carlos.

Kostya Serebryany

unread,
Jan 30, 2014, 12:52:40 AM1/30/14
to Carlos O'Donell, Paul Pluzhnikov, Ondřej Bílka, Joseph S. Myers, Andrew Hunter, GNU C Library, Roland McGrath, address-...@googlegroups.com
As a hack -- yes, maybe.
As a real solution for 2.20+ -- I hope we can do better.
Besides, adding an interceptor to __tls_get_addr means that dynamic
TLS under sanitizers
will become even slower.

--kcc

>
> Thanks.
>
> Cheers,
> Carlos.
>

Carlos O'Donell

unread,
Jan 30, 2014, 12:53:57 AM1/30/14
to Kostya Serebryany, Paul Pluzhnikov, Ondřej Bílka, Joseph S. Myers, Andrew Hunter, GNU C Library, Roland McGrath, address-...@googlegroups.com
On 01/30/2014 12:52 AM, Kostya Serebryany wrote:
>> Not terrible at all, quite elegant actually.
>
> As a hack -- yes, maybe.

Sorry, you made it sound really bad, but the solution was elegant.

> As a real solution for 2.20+ -- I hope we can do better.
> Besides, adding an interceptor to __tls_get_addr means that dynamic
> TLS under sanitizers
> will become even slower.

I fully agree that we need a solution in 2.20+.

Cheers,
Carlos.

Rich Felker

unread,
Jan 30, 2014, 11:54:45 AM1/30/14
to Carlos O'Donell, Kostya Serebryany, Paul Pluzhnikov, Ondřej Bílka, Joseph S. Myers, Andrew Hunter, GNU C Library, Roland McGrath, address-...@googlegroups.com
On Thu, Jan 30, 2014 at 12:49:03AM -0500, Carlos O'Donell wrote:
> On 01/30/2014 12:33 AM, Kostya Serebryany wrote:
> > On Thu, Jan 30, 2014 at 9:23 AM, Carlos O'Donell <car...@redhat.com> wrote:
> >>
> >> On 01/29/2014 04:48 AM, Kostya Serebryany wrote:
> >>> [text only]
> >>>
> >>>> Indeed so, thanks!
> >>>> So, exporting __signal_safe_memalign&co will allow us to extend the
> >>>> existing hack to 2.19.
> >>>> If this simple change can not be done for 2.19, can *anything* be done at all?
> >>>> (Long term we'd still prefer something less hackish)
> >>>
> >>> FTR, I've implemented an even-uglier-then-before hack that deals with
> >>> dynamic TLS in both <=2.18 and 2.19.
> >>> So, we will survive the 2.19 release.
> >>> But I would appreciate if we can resolve
> >>> https://sourceware.org/bugzilla/show_bug.cgi?id=16291
> >>> before the next one (2.20).
> >>
> >> Can you please describe the hack?
> >
> > intercept __tls_get_addr and __libc_memalign.
> > if __libc_memalign is called while we are inside __tls_get_addr, we
> > know we are in <= 2.18 mode and we know what to do.

Or a signal handler happened to interrupt __tls_get_addr and something
from the signal handler caused __libc_memalign to get called. Is this
case handled?

> > if __libc_memalign was not called but the DSO ID passed to
> > __tls_get_addr was not seen before by the current thread,
> > we know that we are in 2.19 mode and that the TLS block was allocated
> > by __signal_safe_memalign, which
> > has a header with the actual block size. Ugly, as I said.
>
> Not terrible at all, quite elegant actually.

I would say making assumptions about the format of the header is
pretty ugly and might even break in the future, especially if the
intent is to eventually allow overriding the AS-safe malloc
implementation...

Rich

Kostya Serebryany

unread,
Jan 30, 2014, 12:38:59 PM1/30/14
to Rich Felker, Carlos O'Donell, Paul Pluzhnikov, Ondřej Bílka, Joseph S. Myers, Andrew Hunter, GNU C Library, Roland McGrath, address-...@googlegroups.com
On Thu, Jan 30, 2014 at 8:54 PM, Rich Felker <dal...@aerifal.cx> wrote:
> On Thu, Jan 30, 2014 at 12:49:03AM -0500, Carlos O'Donell wrote:
>> On 01/30/2014 12:33 AM, Kostya Serebryany wrote:
>> > On Thu, Jan 30, 2014 at 9:23 AM, Carlos O'Donell <car...@redhat.com> wrote:
>> >>
>> >> On 01/29/2014 04:48 AM, Kostya Serebryany wrote:
>> >>> [text only]
>> >>>
>> >>>> Indeed so, thanks!
>> >>>> So, exporting __signal_safe_memalign&co will allow us to extend the
>> >>>> existing hack to 2.19.
>> >>>> If this simple change can not be done for 2.19, can *anything* be done at all?
>> >>>> (Long term we'd still prefer something less hackish)
>> >>>
>> >>> FTR, I've implemented an even-uglier-then-before hack that deals with
>> >>> dynamic TLS in both <=2.18 and 2.19.
>> >>> So, we will survive the 2.19 release.
>> >>> But I would appreciate if we can resolve
>> >>> https://sourceware.org/bugzilla/show_bug.cgi?id=16291
>> >>> before the next one (2.20).
>> >>
>> >> Can you please describe the hack?
>> >
>> > intercept __tls_get_addr and __libc_memalign.
>> > if __libc_memalign is called while we are inside __tls_get_addr, we
>> > know we are in <= 2.18 mode and we know what to do.
>
> Or a signal handler happened to interrupt __tls_get_addr and something
> from the signal handler caused __libc_memalign to get called. Is this
> case handled?

Is __libc_memalign AS-safe? I guess not.
AddressSanitizer's implementation is certainly not.
So, if __libc_memalign is called in a signal handler it is a problem by itself.
That's what forced the the change in 2.19 in the first place.


>
>> > if __libc_memalign was not called but the DSO ID passed to
>> > __tls_get_addr was not seen before by the current thread,
>> > we know that we are in 2.19 mode and that the TLS block was allocated
>> > by __signal_safe_memalign, which
>> > has a header with the actual block size. Ugly, as I said.
>>
>> Not terrible at all, quite elegant actually.
>
> I would say making assumptions about the format of the header is
> pretty ugly and might even break in the future,
can't agree more.

--kcc

Rich Felker

unread,
Jan 30, 2014, 12:50:28 PM1/30/14
to Kostya Serebryany, Carlos O'Donell, Paul Pluzhnikov, Ondřej Bílka, Joseph S. Myers, Andrew Hunter, GNU C Library, Roland McGrath, address-...@googlegroups.com
It's legal to call anything (even non AS-safe functions) from a signal
handler if the signal handler did not interrupt a non-AS-safe
function. This is easy to guarantee if the only threads where the
signal is unblocked are only using AS-safe functions (a trivial
example would be a thread doing for (;;) pause();).

Rich

Kostya Serebryany

unread,
Jan 31, 2014, 12:10:38 AM1/31/14
to Rich Felker, Carlos O'Donell, Paul Pluzhnikov, Ondřej Bílka, Joseph S. Myers, Andrew Hunter, GNU C Library, Roland McGrath, address-...@googlegroups.com
Well, in <=2.18 this means that we can not call malloc in signal
handler while using tls, so for <=2.18 the hack is safe.
Then, there is a check in my hack that the result of the last
__libc_memalign is is the same as the current result
of __tls_get_addr (minus offset), so if we are in 2.19 mode and
__libc_memalign is never called from __tls_get_addr
we are still safe even if __libc_memalign is called from a signal
handler. (I think...)


--kcc

>
> Rich

Lidija Bešker

unread,
Jul 11, 2018, 8:37:13 PM7/11/18
to address-sanitizer
Hello,
I'm reaching out here because some tests are failing and it appears to be directly connected to this.
Tests with dynamic tls tests on LLVM are failing:

  LeakSanitizer-AddressSanitizer-x86_64 :: TestCases/Linux/use_tls_dynamic.cc
  LeakSanitizer-Standalone-x86_64 :: TestCases/Linux/use_tls_dynamic.cc
  MemorySanitizer-X86_64 :: dtls_test.c
  MemorySanitizer-lld-X86_64 :: dtls_test.c

I have concluded that change in glibc has caused this. After changing tls dynamic allocation function from memalign to malloc. This happened in glibc 2.25. To be exact this change https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=fa9a646079950ea326e1f2feee7093580dc1ecd3 caused by https://sourceware.org/bugzilla/show_bug.cgi?id=20432 

I have tried to apply this idea to malloc and it worked with msan but I'm having trouble with lsan. Lsan uses and checks malloc so in case of adding check from sanitizer_tls_get_addr fixes tests in question but other tests fail. That is because some of leaks they detect are misinterpreted as dynamic tls allocations and aren't reported. 
Unlike memalign where libc_memalign was called from elf/dl-tls.c and we used that to catch and process it, in case of malloc it's imposible because malloc is called in new vesion of glibc same as tests.

So problem at the time is, if we are standing by this solution, how to differentiate between dynamic tls malloc and any other malloc or we need approach to this problem? 

Konstantin Serebryany

unread,
Jul 17, 2018, 11:52:40 AM7/17/18
to address-sanitizer
Yes, we are aware of this problem. 
It is a consequence of the unfortunate lack of cross-testing and collaboration between LLVM and Glibc. 
We'll probably fix it eventually, after enough people complain loudly,
but our team is stretched too thin -- we can not give it a priority now. 
Patches are welcome, both to LLVM, Glibc, and/or to either project's testing infra.  

--kcc 

--
You received this message because you are subscribed to the Google Groups "address-sanitizer" group.
To unsubscribe from this group and stop receiving emails from it, send an email to address-saniti...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages