Dozens of crashes found with libdislocator, but cannot reproduce them

1,000 views
Skip to first unread message

Johannes S.

unread,
Nov 8, 2016, 4:43:03 PM11/8/16
to afl-users
Some of my fuzzers are run with libdislocator. In particular, on instance keeps finding new crashes every day - I kept running it for 3 weeks and it found 90+ crashes - but I am unable to verify them.

I use a shell script like this to verify all crashes:

export LD_PRELOAD=/path/to/afl/libdislocator/libdislocator.so
FILES=/path/to/crashes/*
for f in $FILES
do
        echo $f
        ( ulimit -Sv $[100 << 10] -c unlimited; LD_LIBRARY_PATH=. ./fuzzed-binary "$f" )
        mv core "core.$(basename "$f")"
done

The ulimit obviously matches that of the fuzzer instance. However, I cannot reproduce a single one of those 90+ crashes, no matter if I use export to specify the LD_PRELOAD or if I pass it to the binary on the command line. The crashing tests are also completely random (i.e. never several tests stemming from the same source test), which typically indicates a random crash rather than an actual crash in the binary.
For what it's worth, I am also unable to verify the crashes on other platforms using the same source. I am also unable to verify the crashes using any of afl's companions like afl-tmin - it always comes up with 0 byte files.
The only load on the system is generated by the fuzzer instances, and there are plenty of resources left.
Maybe it's related to my other thread with the "Corrupted head alloc canary" issue?

Michal Zalewski

unread,
Nov 8, 2016, 5:11:52 PM11/8/16
to afl-users
> For what it's worth, I am also unable to verify the crashes on other
> platforms using the same source.

There's some possibility that you're dealing with some
non-deterministic crashes due to concurrency, etc. A way to
troubleshoot that would be to retry any of the crashing test cases,
say, 10k times (just a shell script loop or so).

That said:

> I am also unable to verify the crashes
> using any of afl's companions like afl-tmin - it always comes up with 0 byte
> files.

This sounds very odd. What does afl-tmin say, exactly? A file being
shrunk to 0 bytes would imply that the program is not reading the
input at all. Perhaps it's running out of memory, or is not called
correctly (e.g., expecting data on stdin, not in the file in argv[1])?

Running it under afl-showmap -o /dev/null may provide some hints.

> Maybe it's related to my other thread with the "Corrupted head alloc canary"
> issue?

It shouldn't be; I don't know what's up with that, but it's happening
in a very different place.

/mz

Johannes S.

unread,
Nov 8, 2016, 5:57:53 PM11/8/16
to afl-users

There's some possibility that you're dealing with some
non-deterministic crashes due to concurrency, etc. A way to
troubleshoot that would be to retry any of the crashing test cases,
say, 10k times (just a shell script loop or so).

Do you mean concurrency on the OS level or the application level? The latter would be rather strange since I couldn't think of any part of the code which would be responsible for that (this is a single-threaded library).
I do have random crashes that are probably due to OS level concurrency on my other, non-libdislocator fuzzers, but those amount to maybe 1 per week at maximum - not dozens like in this case.
Anyway, based on quick tests I cannot reproduce the issue when executing the test case 10,000 times either. Doing this for all test cases will take a while so I will report back later in case this changes.
 

That said:

> I am also unable to verify the crashes
> using any of afl's companions like afl-tmin - it always comes up with 0 byte
> files.

This sounds very odd. What does afl-tmin say, exactly? A file being
shrunk to 0 bytes would imply that the program is not reading the
input at all. Perhaps it's running out of memory, or is not called
correctly (e.g., expecting data on stdin, not in the file in argv[1])?

Nevermind, the command line for afl-tmin was incorrect, now it minimizes the files in non-crashing mode. Using the same memory limit as the afl-fuzz process from which the crashes came, afl-tmin still doesn't crash on these files.

Cheers,
Johannes

Johannes S.

unread,
Nov 8, 2016, 7:40:23 PM11/8/16
to afl-users
Also, what's worse than those random crashes is that when resuming the job (with an updated and recompiled binary), queue entries that previously worked just fine will crash *consistently*, so I kind of doubt that there's a random factor in here. I think I have asked for this before, and since this keeps happening and I have to delete queue entries one by one until the fuzzer finally resumes its work, I ask again: Would it be possible to write out core dumps for crashed test cases, at the very least during the initial dry run?

Michal Zalewski

unread,
Nov 8, 2016, 7:53:32 PM11/8/16
to afl-users
It's possible, but not trivial (basically, core dumps are slow and can
really mess up timing / crash detection). If the crashes were due to a
page fault, you may have some useful messages (including the offending
addresses) in the dmesg.

Another suggestion for your testing: to make sure that
libdislocator.so is working as expected on your test runs outside AFL,
you may want to set AFL_LD_VERBOSE=1 in the environment. This should
produce verbose messages from the library; if you don't see any,
something is probably wrong with the LD_PRELOAD stuff.

Johannes S.

unread,
Nov 8, 2016, 8:05:33 PM11/8/16
to afl-users

It's possible, but not trivial (basically, core dumps are slow and can
really mess up timing / crash detection). If the crashes were due to a
page fault, you may have some useful messages (including the offending
addresses) in the dmesg.

dmesg has some messages but apparently only from when the fuzzer was running, not from the time I tried to resume it (i.e. when it crashed on a previously working queue entry).
Since I recompiled the binary in the meantime, I'll see if I can make sense out of the addresses found in dmesg once I get to resume this instance and the crashes start happening again.

I understand the problem with core dumps, but my thought was that maybe once a crash is detected, the same test case could be tried again but with core dumps enabled? It's probably way more complicated than that, though.
 

Another suggestion for your testing: to make sure that
libdislocator.so is working as expected on your test runs outside AFL,
you may want to set AFL_LD_VERBOSE=1 in the environment. This should
produce verbose messages from the library; if you don't see any,
something is probably wrong with the LD_PRELOAD stuff.

 The output gets pretty chatty with AFL_LD_VERBOSE=1, so I guess things are okay here.

Johannes S.

unread,
Nov 8, 2016, 8:13:30 PM11/8/16
to afl-users

dmesg has some messages but apparently only from when the fuzzer was running, not from the time I tried to resume it (i.e. when it crashed on a previously working queue entry).
Since I recompiled the binary in the meantime, I'll see if I can make sense out of the addresses found in dmesg once I get to resume this instance and the crashes start happening again.


Nevermind what I said about dmesg - the only stuff it caught were unrelated, actual crashes (which were fixed and the reason why I restarted the fuzzing job, after all). The 90+ mysterious crashes didn't leave any trace there.

Johannes S.

unread,
Jul 2, 2017, 6:29:40 PM7/2/17
to afl-users
So... I am still having this issue - I have crashes on seemingly random files that are only found with libdislocator (ASAN finds nothing, Visual Studio debugger finds nothing, ...) and I need to remove those testcases from the queue to be able to resume fuzzing at all. Reading about the recent addition of the -c parameter to afl-showmap, I thought that this could finally help me with getting closer to the issue, but nope.
I tried two variations:

AFL_PRELOAD=contrib/fuzzing/afl/libdislocator/libdislocator.so contrib/fuzzing/afl/afl-showmap -c -m 100 -o tracedata -- bin/fuzz crashing_queue_entry

( ulimit -c unlimited; AFL_PRELOAD=contrib/fuzzing/afl/libdislocator/libdislocator.so contrib/fuzzing/afl/afl-showmap -c -m 100 -o tracedata -- bin/fuzz crashing_queue_entry )

In both cases the output is as follows:

afl-showmap 2.44b by <lca...@google.com>
[*] Executing 'bin/fuzz'...

-- Program output begins --
-- Program output ends --

+++ Program killed by signal 11 +++
[+] Captured 1278 tuples in 'tracedata'.

And no core dump is being generated (I know that the latter command would normally create a file called "core" in the current directory, this is on a Debian server setup, not a desktop setup where the core dumps might be put elsewhere by some desktop tool).
Given that afl-showmap somehow figures out that my program is crashing with a segfault, I really should be able to get my hands on the crash location, but it's giving me a hard time.

Johannes S.

unread,
Jul 2, 2017, 6:38:16 PM7/2/17
to afl-users
An addition, in case it was not clear enough: The crashes do not happen if I test this specific file outside of afl-fuzz / afl-showmap, and the crash is deterministic in those tools, i.e. I get the same crash on every run.

Jakub Wilk

unread,
Jul 3, 2017, 8:29:02 AM7/3/17
to afl-...@googlegroups.com
* Johannes S. <saga...@gmail.com>, 2017-07-02, 15:29:
>I have crashes on seemingly random files that are only found with
>libdislocator (ASAN finds nothing, Visual Studio debugger finds nothing, ...)
>and I need to remove those testcases from the queue to be able to resume
>fuzzing at all. Reading about the recent addition of the -c parameter to
>afl-showmap, I thought that this could finally help me with getting closer to
>the issue, but nope.
>I tried two variations:
>
>AFL_PRELOAD=contrib/fuzzing/afl/libdislocator/libdislocator.so
>contrib/fuzzing/afl/afl-showmap -c -m 100 -o tracedata -- bin/fuzz
>crashing_queue_entry
>
>( ulimit -c unlimited;
>AFL_PRELOAD=contrib/fuzzing/afl/libdislocator/libdislocator.so
>contrib/fuzzing/afl/afl-showmap -c -m 100 -o tracedata -- bin/fuzz
>crashing_queue_entry )

afl-fuzz attempts to sets core limit to unlimited on its own, but AFAICT it
would only succeed if the hard limit was infinity originally.

I don't think this the bug you're running into, because "ulimit -c" would
fail loudly if it was.

Perhaps run it under "strace -f" and see if there's anything suspicious in the
strace output?

You might also want to consult the core(5) manpage, which contains a long list
of circumstances in which a core dump file is not produced.

>(I know that the latter command would normally create a file called "core" in
>the current directory,

OK...

>this is on a Debian server setup, not a desktop setup where the core dumps
>might be put elsewhere by some desktop tool).

This sounds more like "I guess" than "I know".
What's in your /proc/sys/kernel/core_pattern file?

--
Jakub Wilk

Johannes S.

unread,
Jul 3, 2017, 5:57:05 PM7/3/17
to afl-users


Perhaps run it under "strace -f" and see if there's anything suspicious in the
strace output?

The last few lines in the log before returning to the parent process are:

41768 mprotect(0x7ffff1c33000, 4096, PROT_NONE) = 0
41768 mprotect(0x7ffff1c39000, 4096, PROT_NONE) = 0
41768 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
41768 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
41768 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_ACCERR, si_addr=0x7ffff1c33000} ---
41768 +++ killed by SIGSEGV +++

Those mmap calls look like the ones coming from afl's instrumentation. si_addr=0x7ffff1c33000 corresponds to one of those calls. I think we're really hitting a rare bug in the instrumentation with libdislocator here, and it would explain why I cannot reproduce it if the process is run from the shell rather than through afl.


>this is on a Debian server setup, not a desktop setup where the core dumps
>might be put elsewhere by some desktop tool).

This sounds more like "I guess" than "I know".
What's in your /proc/sys/kernel/core_pattern file?

No, it is an "I know". The core_pattern is "core".

Cheers, Johannes

Michal Zalewski

unread,
Jul 3, 2017, 9:24:33 PM7/3/17
to afl-users
> 41768 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
> 41768 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
> 41768 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_ACCERR, si_addr=0x7ffff1c33000} ---
> 41768 +++ killed by SIGSEGV +++

That's just libdislocator failing to allocate memory (and returning
NULL). You may want to try under GDB to see what's actually caussing
segv after this failure.

> Those mmap calls look like the ones coming from afl's instrumentation. si_addr=0x7ffff1c33000 corresponds to one of those calls.

What do you mean specifically?

/mz

Jakub Wilk

unread,
Jul 4, 2017, 6:36:37 AM7/4/17
to afl-...@googlegroups.com
* Johannes S. <saga...@gmail.com>, 2017-07-03, 14:57:
>>Perhaps run it under "strace -f" and see if there's anything suspicious in
>>the strace output?

To clarify, I meant to look for things that may prevent dumping core, e.g.
setrlimit, prlimit, prctl or chdir syscalls.

>41768 +++ killed by SIGSEGV +++

OK, so the core indeed wasn't dumped.
(strace would print "(core dumped)" if it was.)

--
Jakub Wilk

Johannes S.

unread,
Jul 4, 2017, 3:41:01 PM7/4/17
to afl-users

That's just libdislocator failing to allocate memory (and returning
NULL). You may want to try under GDB to see what's actually caussing
segv after this failure.

I tried gdb first, but it didn't help either. It simply continues execution of afl-showmap after the segfault of the child process, and afl-showmap then quits normally. And as I have said before, the crashes only occur when run inside afl(-showmap), if I run the executable with the same memory limits, it doesn't crash so I naturally cannot debug any segfault.It also doesn't crash if I vary the memory limit by a few KB or MB.

 

> Those mmap calls look like the ones coming from afl's instrumentation. si_addr=0x7ffff1c33000 corresponds to one of those calls.

What do you mean specifically?

Okay, having a look at the calls again, the address comes from an mprotect call which is inserted by libdislocator, not by the afl instrumentation. So the crash supposedly happens when accessing a protected page, but neither ASan nor any other tools can find this crash on the dozens of crashing test cases I got. By now I find it rather unlikely that the crash is somewhere in my code.



> To clarify, I meant to look for things that may prevent dumping core, e.g. setrlimit, prlimit, prctl or chdir syscalls.

I attached the log but I could not find anything in there from which I could deduce that anything is being disabled.
The core dumping works perfectly fine with any of the other reproducable crashes I had in the past, by the way.

Cheers,
Johannes

Johannes S.

unread,
Jul 4, 2017, 3:42:08 PM7/4/17
to afl-users
...and here's the log I meant to attach.
tr.zip

Jakub Wilk

unread,
Jul 4, 2017, 4:01:13 PM7/4/17
to afl-...@googlegroups.com
* Johannes S. <saga...@gmail.com>, 2017-07-04, 12:41:
>I attached the log but I could not find anything in there from which I could
>deduce that anything is being disabled.

Thanks. So the log says:

41767 execve("contrib/fuzzing/afl/afl-showmap", ["contrib/fuzzing/afl/afl-showmap", "-c", "-m", "100", "-o", "tracedata", "--", "bin/fuzz", "/home/saga/fuzzers/fuzzer04/queu"...], [/* 21 vars */]) = 0
...
41768 setrlimit(RLIMIT_CORE, {rlim_cur=0, rlim_max=0}) = 0
41768 execve("bin/fuzz", ["bin/fuzz", "/home/saga/fuzzers/fuzzer04/queu"...], [/* 27 vars */]) = 0

So this is afl-showmap disabling core dumping... Huh?
I looked at the source again, which says:

if (keep_cores) r.rlim_max = r.rlim_cur = 0;
else r.rlim_max = r.rlim_cur = RLIM_INFINITY;

This is backwards. Bad Michal, bad. :-P

--
Jakub Wilk

Johannes S.

unread,
Jul 4, 2017, 4:05:26 PM7/4/17
to afl-users


This is backwards.

And I thought I was losing my sanity. :) In the meanwhile I also managed to get gdb running ("set follow-fork-mode child" does the magic). gdb now shows a crash in a line of code in my library that, in theory, should never be able to crash (all it does is checking an index into a vector is valid). This doesn't make it any easier...

- Johannes

Michal Zalewski

unread,
Jul 4, 2017, 4:50:00 PM7/4/17
to afl-users
> I tried gdb first, but it didn't help either. It simply continues execution
> of afl-showmap after the segfault of the child process, and afl-showmap then
> quits normally.

Right, but you should be able to attach to the child, not to the
parent. Try "set follow-fork-mode child". If that doesn't work, add
sleep(60) or so at the beginning of the target program, and then
attach by PID.

> Okay, having a look at the calls again, the address comes from an mprotect
> call which is inserted by libdislocator, not by the afl instrumentation. So
> the crash supposedly happens when accessing a protected page, but neither
> ASan nor any other tools can find this crash on the dozens of crashing test
> cases I got. By now I find it rather unlikely that the crash is somewhere in
> my code.

Oh, I'm not saying that libdislocator is bug-free. But we'd need to
narrow it down to some offending pattern in the underlying code (see
above).

/mz

Johannes S.

unread,
Jul 4, 2017, 4:56:20 PM7/4/17
to afl-users
Thanks to "set follow-fork-mode child" I finally managed to track down this particular crash - it was reading one item beyond a container's size in low-memory situations when a previous allocation failed. Hence I was unable to find it with ASan in previous debugging sessions, and when manually setting the ulimit, it was probably a matter of bytes or kilobytes that made the difference compared to running in an afl environment. This particular crash is resolved now, but there are probably dozens of others waiting that I cannot explain... I'll post here if any of them turn out to be as mysterious as this one again.

- Johannes

Michal Zalewski

unread,
Jul 4, 2017, 5:02:52 PM7/4/17
to afl-users
:-) Cool!

Another case that may appear only with libdislocator is
"use-after-free" after calling realloc() to resize a buffer. This is
because realloc() may resize in place, especially if new size is
smaller than old size; in such a case, old "old" pointer (now
theoretically invalid) and the "new" one (returned by realloc()) are
indistinguishable. But libdislocator always forces realloc() to return
a new address and always mprotects() the old region.

Cheers,
/mz

Johannes S.

unread,
Jul 4, 2017, 10:14:22 PM7/4/17
to afl-users
> I'll post here if any of them turn out to be as mysterious as this one again.

Okay, I think I might have finally found the reason for all the other crashes that have been accumulating over time. They all were SIGABRTs, caused by a call to mktime!

#0  0x00007ffff6319067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff631a448 in __GI_abort () at abort.c:89
#2  0x00007ffff6312266 in __assert_fail_base (fmt=0x7ffff644af18 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7ffff644893f "num_types == 1", file=file@entry=0x7ffff6448936 "tzfile.c", line=line@entry=779, function=function@entry=0x7ffff644fc90 <__PRETTY_FUNCTION__.6261> "__tzfile_compute") at assert.c:92
#3  0x00007ffff6312312 in __GI___assert_fail (assertion=assertion@entry=0x7ffff644893f "num_types == 1", file=file@entry=0x7ffff6448936 "tzfile.c", line=line@entry=779, function=function@entry=0x7ffff644fc90 <__PRETTY_FUNCTION__.6261> "__tzfile_compute") at assert.c:101
#4  0x00007ffff6391907 in __tzfile_compute (timer=1447111074, use_localtime=use_localtime@entry=1, leap_correct=leap_correct@entry=0x7fffffffd848, leap_hit=leap_hit@entry=0x7fffffffd844, tp=tp@entry=0x7fffffffd960) at tzfile.c:779
#5  0x00007ffff6390429 in __tz_convert (timer=0x7fffffffd948, use_localtime=1, tp=0x7fffffffd960) at tzset.c:635
#6  0x00007ffff638eab0 in ranged_convert (convert=0x7ffff638e940 <__localtime_r>, t=0x7fffffffd948, tp=0x7fffffffd960) at mktime.c:310
#7  0x00007ffff638edd5 in __mktime_internal (tp=0x7fffffffdab0, convert=0x7ffff638e940 <__localtime_r>, offset=0x7ffff668bab8 <localtime_offset>) at mktime.c:478
#8  0x00007ffff6c02083 in OpenMPT::mpt::Date::Unix::FromUTC (timeUtc=...) at common/mptTime.cpp:115
(...)

Our own  code is nothing more than this:
tm t = timeUtc;
time_t localSinceEpoch = mktime(&t);

...so I guess I found a bug in the standard library implementation that can only be triggered in low-memory situations?

- Johannes

Johannes S.

unread,
Jul 5, 2017, 4:03:48 PM7/5/17
to afl-users
... looks indeed like this is another glibc trophy for the trophy case, thanks to libdislocator =)

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=867283
Reply all
Reply to author
Forward
0 new messages