Does it make sense to run afl very long?

1979 views
Skip to first unread message

Hanno Böck

unread,
Mar 29, 2015, 6:55:43 AM3/29/15
to afl-users
Hi,

Just something that's been in my mind for a while: I hear and read
quite often that people tend to fuzz for very long times - weeks or
even months - on a single software/input.

I wonder how much sense that makes.
My personal experience is pretty much that most bugs turn up within
minutes, some within hours and when the process ran for a day I don't
expect that anything interesting will show up any more.

I feel that there is a lot of ressource wasting going on (add to that
that some people probably fuzz without noticing that their setup
doesn't do anything at all). When afl doesn't find something
within a day I see this as a signal that I need to move on and try
something new.

And to make this a bit more concrete: If you feel you had relevant
success in the past after fuzzing more than a day on a reasonably
current machine can you post the details in a way that I can try to
reproduce it? (Something like "I found CVE-2014-xxxx in libxyz
version 1.2.3 - afl ran for three days without any crashes and on day
four I found it")

I would like to see this question answered in a reasonable way to give
people better guidance how to fuzz, so I intend to set up some
experiments to replicate past bug findings.

cu,
--
Hanno Böck
http://hboeck.de/

mail/jabber: ha...@hboeck.de
GPG: BBB51E42

Peter Gutmann

unread,
Mar 29, 2015, 8:03:43 AM3/29/15
to Hanno Böck, afl-users
Hanno =?UTF-8?B?QsO2Y2s=?= <ha...@hboeck.de> writes:

>(add to that that some people probably fuzz without noticing that their setup
>doesn't do anything at all)

Actually that's something I was wanting to bring up as a side-effect of trying
to sort out the file-truncation problem I mentioned in an earlier post (which
I've now determined is limited to just one system, so "don't do that, then" is
a quick fix for now).

afl doesn't provide any easy way to distinguish general diagnostic output from
status-screen output, it's really all-or-nothing, which makes it a pain to try
and script because the useful output ("something seems to have gone wrong and
afl is now spinning in a tight loop") is mixed in with endless status-screen
updates. Would it be possible to add an option to disable the status screen,
so only the general diagnostic output is produced?

Related to this, I'm using fuzzer_stats to monitor and display progress (so I
get pinged when things happen), however doing a resume ("-i -") seems to reset
the stats, so the monitoring script can't track the current state. It'd be
good to have afl continue from the previous fuzzer_stats info rather than
resetting the counters.

tl;dr: There are some (hopefully minor) changes that could be made to afl to
make it more easily scriptable.

Peter.

Michael Rash

unread,
Mar 29, 2015, 8:27:28 AM3/29/15
to afl-users
On Sun, Mar 29, 2015 at 6:55 AM, Hanno Böck <ha...@hboeck.de> wrote:
Hi,

Just something that's been in my mind for a while: I hear and read
quite often that people tend to fuzz for very long times - weeks or
even months - on a single software/input.

I wonder how much sense that makes.
My personal experience is pretty much that most bugs turn up within
minutes, some within hours and when the process ran for a day I don't
expect that anything interesting will show up any more.

I feel that there is a lot of ressource wasting going on (add to that
that some people probably fuzz without noticing that their setup
doesn't do anything at all). When afl doesn't find something
within a day I see this as a signal that I need to move on and try
something new.

And to make this a bit more concrete: If you feel you had relevant
success in the past after fuzzing more than a day on a reasonably
current machine can you post the details in a way that I can try to
reproduce it? (Something like "I found CVE-2014-xxxx in libxyz
version 1.2.3 - afl ran for three days without any crashes and on day
four I found it")

While I can't personally produce data of the type you mention above, my assumption has always been that there is a chance AFL could find a crash until the "pending" stat goes to zero regardless of how long this takes. That is, if "pending" implies execution paths that AFL hasn't been able to exercise yet, couldn't a crash be found at any point as these new paths are fuzzed? And, even when there are zero pending paths, the havoc stage could still turn something up, although perhaps this is a lot less likely?

Thanks,

--Mike

 

I would like to see this question answered in a reasonable way to give
people better guidance how to fuzz, so I intend to set up some
experiments to replicate past bug findings.

cu,
--
Hanno Böck
http://hboeck.de/

mail/jabber: ha...@hboeck.de
GPG: BBB51E42

--
You received this message because you are subscribed to the Google Groups "afl-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to afl-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Michael Rash
http://www.cipherdyne.org/
Key fingerprint = 53EA 13EA 472E 3771 894F  AC69 95D8 5D6B A742 839F

Michal Zalewski

unread,
Mar 29, 2015, 12:48:32 PM3/29/15
to afl-users
> Just something that's been in my mind for a while: I hear and read
> quite often that people tend to fuzz for very long times - weeks or
> even months - on a single software/input.

It really depends, I think. In most cases, after the fuzzer completes
the first pass, the likelihood of new discoveries goes down. By the
time the "pending" counter gets to zero, the residual likelihood is
gonna be very, very low.

Now, the first pass can take a couple of hours for fast targets, but
can be more time-consuming for slow ones.

That said, I had some bugs that cropped up only after a longer run
(including libxml2, sqlite, libjpeg, etc).

/mz

Michal Zalewski

unread,
Mar 29, 2015, 12:53:24 PM3/29/15
to afl-users, Hanno Böck
> afl doesn't provide any easy way to distinguish general diagnostic output from
> status-screen output, it's really all-or-nothing, which makes it a pain to try
> and script because the useful output ("something seems to have gone wrong and
> afl is now spinning in a tight loop") is mixed in with endless status-screen
> updates. Would it be possible to add an option to disable the status screen,
> so only the general diagnostic output is produced?

What do you mean by "general diagnostic output"? As you noticed, the
fuzzer writes fairly useful stats to fuzzer_stats in the output
directory (and there is also plot_data with historical results). You
can use afl-whatsup and afl-plot to convert them to something more
readable, or use the same principle to implement your own metrics.
Would you want to see some other data in the file?

If you don't want the status screen, simply point stdout to a non-tty
file descriptor.

> Related to this, I'm using fuzzer_stats to monitor and display progress (so I
> get pinged when things happen), however doing a resume ("-i -") seems to reset
> the stats, so the monitoring script can't track the current state. It'd be
> good to have afl continue from the previous fuzzer_stats info rather than
> resetting the counters.

This is tricky, because users can change settings or make other
adjustments for resumed sessions. If something is wrong with the
resumed one, showing them old stats will probably do more harm than
good, because they won't notice that they are no longer finding
anything.

/mz

floyd

unread,
Mar 30, 2015, 3:30:21 AM3/30/15
to afl-...@googlegroups.com
My experience: Been running very slow targets on a pretty slow CPU VM
for 100+ days (would otherwise be an idling machine). Because I don't
have time to check on the fuzzer every week it just sits there and
fuzzes a target that is able to parse a lot of file formats. In this
case afl is turning up with new file format variants from time to time.
But probably an edge-case and as said in a previous thread, my input
selection is probably already ok, but could be better

time+power consumption is a laziness trade-off
floyd
@floyd_ch
http://www.floyd.ch

Peter Gutmann

unread,
Mar 30, 2015, 5:57:38 AM3/30/15
to afl-...@googlegroups.com
Michal Zalewski <lca...@gmail.com> writes:

>What do you mean by "general diagnostic output"?
>[...]
>If you don't want the status screen, simply point stdout to a non-tty file
>descriptor.

That doesn't work because it also throws away the information I do want to
see. The problem is that afl_fuzz produces two lots of output, the stuff I
need to see:

[*] Checking core_pattern...
[*] Setting up output directories...
[+] Output directory exists, will attempt session resume.
[*] Deleting old session data...
[+] Output dir cleanup successful.
[*] Rotating shield harmonics...
[*] Scanning '[...]'...
[+] Loaded 50 auto-discovered dictionary tokens.
[*] Creating hard links for all input files...
[*] Recalibrating sensor array...
[*] Validating target binary...
[*] Attempting dry run with 'id:000000,orig:input.dat'...
[*] Reversing polarity of neutron flow...
[*] Spinning up the fork server...
[+] All right - fork server is up.
len = 1552, map size = 2070, exec speed = 13509 us
[...]

and the stuff I don't (the status screen). There's no way to separate the
two, you either have to see both types or none. To give an example of why
this is a problem, I currently have several runs of afl_fuzz dying at some
point (eventually I notice the processes aren't active any more), but the
reason is buried in several hundred MB of nohup.out log. If there was a way
to say:

afl-fuzz -silent .... > fuzz.log

(which outputs the diagnostic info but not the continuous status-screen
updates) then I could see where the problem is. One way to do this perhaps
would be to allow a user-set update frequency for the status screen, 0s = no
update (i.e. never display it), 1s (or whatever) = default, and then you could
ask for something like an update every ten minutes or so, enough to see it's
still running but not enough to flood a log file.

>Would you want to see some other data in the file?

No, fuzzer_stats is just fine, but:

>This is tricky, because users can change settings or make other adjustments
>for resumed sessions. If something is wrong with the resumed one, showing
>them old stats will probably do more harm than good, because they won't
>notice that they are no longer finding anything.

The other side of the coin is that with the reset it's really hard to tell
what state the fuzzer is in without recording start_time and execs_done for
each snapshot of fuzzer_stats and then having a pile of maths to track the
total. More worrying though is the fact that there's no way to tell whether
the reset values represent a continuation of a previous fuzzing run or a
restart from scratch, at one point afl_fuzz went from havoc back to flip1
(with cycles_done constant at 0) after a restart, and I spent quite some time
trying to figure out whether it had started entirely from scratch or not (I
was never able to be certain, in the end I just let it run).

Actually, if you really want to retain the reset-stats-on-each-resume
semantics, what about also writing a fuzzer_stats_cumulative log or something
similar?

Peter.

h...@crypt.org

unread,
Mar 31, 2015, 3:58:59 PM3/31/15
to afl-...@googlegroups.com, h...@crypt.org
Hanno =?UTF-8?B?QsO2Y2s=?= <ha...@hboeck.de> wrote:
:Just something that's been in my mind for a while: I hear and read
:quite often that people tend to fuzz for very long times - weeks or
:even months - on a single software/input.
:
:I wonder how much sense that makes.
:My personal experience is pretty much that most bugs turn up within
:minutes, some within hours and when the process ran for a day I don't
:expect that anything interesting will show up any more.

As others have suggested, I think this very much depends on the target
application. In fuzzing perl, I have not yet completed a cycle: the
longest run (around 10-12 days) was interrupted by an unavoidable
reboot due to hardware issues at around 70%, and the distinct crashes
found have been weighted only mildly towards the start of such runs.
I've taken a break for now pursuing other priorities, but plan to
revisit this with a run of (I guess) at least a month hoping (finally)
to complete at least one cycle.

I didn't record at what stage of the cycle each of my own bug reports
was found, but the list below are ticket numbers for bug reports
resulting from AFL fuzzing by three distinct contributors, which will
give you an idea of the range of issues found. (Tickets in perl's RT
queue at rt.perl.org.)

Hugo
---
123539
123542
123554
123617
123652
123677
123710
123711
123712
123735
123736
123737
123753
123755
123759
123763
123765
123782
123790
123801
123802
123814
123816
123817
123821
123836
123840
123843
123846
123847
123848
123849
123852
123861
123870
123874
123893
123946
123951
123954
123955
123960
123961
123963
123991
123994
123995
124004
124097
124099
124187

Jodie Cunningham

unread,
Mar 31, 2015, 7:42:30 PM3/31/15
to afl-...@googlegroups.com
On Sunday, March 29, 2015 at 5:55:43 AM UTC-5, hannobc wrote:
And to make this a bit more concrete: If you feel you had relevant
success in the past after fuzzing more than a day on a reasonably
current machine can you post the details in a way that I can try to
reproduce it? 

For a period of time I had ImageMagick running pretty much continuously on 12-24 cores.  I filed bugs on 12/9/2014, two on 12/11/2014, two on 12/14/2014, two on 12/18/2014, three on 12/19/2014,  one on 12/21/2014, 12/27/2014, three on 12/28/2014, one on 12/31/2014, 1/1/2015, 1/5/2015, two on 1/6/2015, one on 1/7/2015, 1/8/2015, 1/9/2015, 1/11/2015, 1/12/2015, 1/24/2015, and four on 1/25/2015. 

The number of input formats is pretty broad in IM, so I feel it's more of the exception rather than the rule. I could have also saved some time by specifically limiting the input format, eg: png:@@ rather than @@ . ( During one run I was fuzzing some Sun input files on GraphicsMagick and it wound up making it into the Cineon decoder, where it also found problems. )

Having fuzzed through 50+ projects in Debian, I agree with your assessment of the time duration. The constraint I'm running into is certainly my availability to configure the tests and act on the crashers, not for computing resources. 

Chris Bisnett

unread,
Apr 5, 2015, 12:35:46 PM4/5/15
to afl-...@googlegroups.com
Charlie Miller included some relevant statistics in a talk a few years ago. You can find the slides here: https://fuzzinginfo.files.wordpress.com/2012/05/cmiller-csw-2010.pdf.

The basic premise is that given a long enough run time your fuzzer will find all the bugs it's capable of finding but the question is how to know when you reach that point. His suggestion is that you graph the number of "unique" crashes found over the number of iterations and at some point the graph will become asymptotic at some upper bound. At this point your fuzzer will have reached it's practical limit for finding new bugs.

This is not a complete answer since this is highly dependent on your input set when doing mutation-based fuzzing. Adding new inputs to your set or finding new paths, as is the case with AFL, will lead to potentially new "unique" crashes.

Likely the answer is a weighted metric combining these statistics with the number of new paths found per iteration. That should give a good approximation of when your fuzzer has reached it potential.

- Chris
--

Ben Nagy

unread,
Apr 5, 2015, 7:53:48 PM4/5/15
to afl-...@googlegroups.com
On Sun, Mar 29, 2015 at 10:55 PM, Hanno Böck <ha...@hboeck.de> wrote:
> And to make this a bit more concrete: If you feel you had relevant
> success in the past after fuzzing more than a day on a reasonably
> current machine can you post the details in a way that I can try to
> reproduce it?

Try fuzzing a more complex format. I'm seeing results with PDF that
don't even start to get interesting until 100-200m tests in. In one
test (ghostscript) I had an exponential growth in unit-crashes-found
from 10m to 100m. This is perfectly intuitive, to me - I am used to
weeks/months of fuzzing per target for complex file formats.

Chris mentioned one of Charlie's talks about the "when to stop" line -
that's a useful thought experiment, but the trouble with asymptotes is
that the closer you magnify the graph the less smooth they appear. In
other words, sure it _looks_ like you're hitting your limit, but
really you're just noodling around _near_ it and the next bug might be
around the corner. I have to also note that the "shape" of your
crashes/time curve with afl will be completely different to the shape
with Charlie's eponymous algorithm, because of the genetic element in
afl. Millerfuzz can always hit any bug with a fixed, bug-specific
probability (based on the specific inputs required), but only on
inputs that cover the feature. That means that simple bugs shake out
very early and then you start to decay. AFL _increases_ its
probability of hitting deep bugs with each new depth level, and can
actively (if slowly) evolve to reach code that's not covered by the
starting inputs.

There's some kind of function somewhere that would consider reached
code, available code, and new coverage per unit time, but to be honest
it's extremely target specific. Unless you're fuzzing very simple
code, a more practical approach is probably to just keep continually
fuzzing the latest version of your target. I fuzzed Office for, I
don't know, five years or something and neither Microsoft nor I ever
came close to exhausting the available crashes in even a single
format.

Cheers,

ben
Reply all
Reply to author
Forward
0 new messages