Re: TSan "Nested bug" intermittent failure

13 views
Skip to first unread message

Dmitry Vyukov

unread,
May 6, 2020, 6:29:28 AM5/6/20
to Christian Holler, thread-sanitizer, Vitaly Buka, address-sanitizer, memory-s...@googlegroups.com
On Wed, May 6, 2020 at 12:13 PM Christian Holler <cho...@mozilla.com> wrote:
>
> Hi,
>
> in our CI, we keep encountering this intermittent failure:
>
> > ==3406==WARNING: Symbolizer buffer too small
> > ==3406==WARNING: Symbolizer buffer too small
> > ThreadSanitizer:DEADLYSIGNAL
> > ThreadSanitizer: nested bug in the same thread, aborting.
>
> We are tracking this issue at
> https://bugzilla.mozilla.org/show_bug.cgi?id=1615608
>
> Any advice on how to debug/fix this problem? Also, if this has been
> addressed in a newer Clang version, would you mind pointing me at the
> fix, so we can backport it? We are still using Clang 9 in CI for now.

Hi Christian,

There seems to be some correlation between these "Symbolizer buffer
too small" warnings and subsequent hard crash, right?
The bug probably affects all sanitizers because the code is all common.
I wonder if we have a bug on that path. I suspect it may have never been tested.

I am not aware of any fixes in that area (though, it's not that I was
looking at all fixes).

I see several reasonable next steps:
1. Add a test with an extremely large function name in a report
(should be doable with some C++ recursive template magic). Wonder how
compilers handle function name >16K....
Just to check if we have some stupid bug on that path.

2. Extend the error message to dump max_length, read_len, input
command and what was read from symbolizer so far.
It may provide some insight into what happens.

Christian Holler

unread,
Jul 17, 2020, 4:48:07 AM7/17/20
to Dmitry Vyukov, thread-sanitizer, Vitaly Buka, address-sanitizer, memory-s...@googlegroups.com
Sorry for the long delay here, but I finally found time to work on this
now and managed to reproduce it with some debug patch applied to our
compiler-rt.

So far, the only information I have is by changing the output here:

> @@ -530,12 +531,23 @@ bool SymbolizerProcess::ReadFromSymbolizer(char
> *buffer, uptr max_length) {
>      if (ReachedEndOfOutput(buffer, read_len))
>        break;
>      if (read_len + 1 == max_length) {
> -      Report("WARNING: Symbolizer buffer too small\n");
> +      Report("WARNING: Symbolizer buffer too small (%zu, %zu,
> %zu)\n", read_len, max_length, just_read);
>        read_len = 0;
>        break;
>      }
>    }

With that, I see that the WARNINGs look like this:

[task 2020-07-16T19:57:55.124Z] 19:57:55     INFO - GECKO(1266) |
==1385==WARNING: Symbolizer buffer too small (16383, 16384, 4095)
[task 2020-07-16T19:57:55.125Z] 19:57:55     INFO - GECKO(1266) |
==1385==WARNING: Symbolizer buffer too small (16383, 16384, 4094)
[task 2020-07-16T19:57:55.126Z] 19:57:55     INFO - GECKO(1266) |
==1385==WARNING: Symbolizer buffer too small (16383, 16384, 16383)
[task 2020-07-16T19:57:55.127Z] 19:57:55     INFO - GECKO(1266) |
ThreadSanitizer:DEADLYSIGNAL
[task 2020-07-16T19:57:55.127Z] 19:57:55     INFO - GECKO(1266) |
ThreadSanitizer: nested bug in the same thread, aborting.

We have the first warning, where `just_read` is something around
4094/4095 quite often.

However, when the "nested bug" appears, it it *always* 16383 (max_length
- 1).

I've been trying to output the buffer, but I am having difficulties in
doing so (not sure if this is a problem in our CI or a problem in my
patch, I will keep trying).

If you have any idea what might be happening around this particular edge
case, that would be great.

I also tried locally what you suggested and tested sanitizer symbolizing
with huge templates, but I was not able to reproduce the bug at all.


Cheers,

Chris

Dmitry Vyukov

unread,
Jul 18, 2020, 5:02:34 AM7/18/20
to Christian Holler, thread-sanitizer, Vitaly Buka, address-sanitizer, memory-s...@googlegroups.com
Maybe this symbolizer warning is just a red herring, hard to say.

I've tried to create very long identifiers using this modified tsan test:
https://gist.githubusercontent.com/dvyukov/a8f167c5e62349ede83db3b69e77533b/raw/d500a54803d04140a581705a948844fa69080891/gistfile1.txt
and giving compiler different values in -DXXX=NNN, e.g. 1000, 2000.
And also with -DLongLongLongLong=AAA.

It easily triggers the "WARNING: Symbolizer buffer too small", but no crashes.

I don't have any other ideas so far. Maybe just some memory corruption.
Reply all
Reply to author
Forward
0 new messages