Toggling Control Flow Tracing

31 views
Skip to first unread message

Benjamin Kallus

unread,
Jul 5, 2023, 11:17:02 PM7/5/23
to afl-users
I'm working on fuzzing HTTP parsers. Unfortunately, most of these parsers are embedded inside of HTTP servers and proxies. The parsers embedded within servers and proxies are not always separated from the rest of the codebase, so these programs must be fuzzed end-to-end.

I think a reasonable approach might look something like this:
1. Send an AFL-generated request to the server
2. Signal the server out-of-band to begin control flow tracing
3. Receive a response from the server
4. Signal the server out-of-band to stop control flow tracing
5. Collect the control flow trace from the server

The difficulty with this approach is with toggling the control flow tracing. Does AFL (or any other control flow tracer) support something like this?

Thanks!

Brandon Perry

unread,
Jul 6, 2023, 8:21:35 AM7/6/23
to afl-...@googlegroups.com
I’ve had luck fuzzing general servers end-to-end with preeny and minimal code changes 




--
You received this message because you are subscribed to the Google Groups "afl-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to afl-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/afl-users/7b6be3f0-8f16-46ce-a4f3-6c88fd2a9fb2n%40googlegroups.com.

signature.asc

Connor Shugg

unread,
Jul 6, 2023, 9:42:44 AM7/6/23
to afl-...@googlegroups.com
Hi,

I agree with Preeny being a good starting option, but there are better options out there that are built specifically for "de-socketing" a web server. Like Brandon said, Preeny often requires code changes to the target server. It doesn't support the de-socketing of certain system calls a web server typically makes, hence the required code changes. A few more options to consider:
  • libdesock - probably one of the best solutions out there right now, depending on your use case. It emulates the entire network stack in user space and is built with AFL/AFL++ in mind.
  • gurthang - something I developed for my Masters degree that tackles many of the hurdles you're probably facing. It uses the underlying network stack on your system (i.e. it doesn't emulate it), and spawns a few server-side threads to handle passing your input to the server via an internal socket. Arguably a more "natural" solution than libdesock.
I can't speak as much to libdesock as I can my own library, but I'd imagine the benefits are the same: these should let you build your server - without source code modification - and pass your HTTP requests through stdin to the server for processing. I'm more than happy to help out more if you wind up wanting to try out my library.

As for fuzzing only the parser: Preeny and the other libraries are great for fuzzing the server as-is (end-to-end). If you truly want to isolate the parsing code, you may want to take a look at AFL++'s persistent mode and deferred initialization. These may allow you to narrow down the code region that does the actual parsing for fuzzing. Some ideas on what you can do with this:
  • Use deferred initialization to have AFL++ skip past all the standard web server initialization code when re-running the target program for new payloads. This'll allow each run of the target during the fuzzing campaign to start closer to where the parsing code is.
  • Wrap the persistent mode loop code around the HTTP parsing code to have AFL++ focus on only that section for fuzzing.
In my experience, you can easily fuzz a server's HTTP parser even if you're fuzzing the program end-to-end. De-socketing libraries will ensure your HTTP payload is delivered straight to the parser, which, if it's buggy, should fail given the right payload.

Hope this helps!
Connor

Benjamin Kallus

unread,
Jul 6, 2023, 11:26:19 AM7/6/23
to afl-users
Thank you for your thorough responses.

My primary concern about desocketing is that some of our servers take input on multiple sockets at once (because of, for example, admin interfaces), which as far as I can tell is not supported by preeny or libdesock.
That said, we only care about talking on one socket at a time, so I don't think we need anything as sophisticated as Comux. Other than speed, is there any reason to desocket when toggling instrumentation is an option?

Because we are a 2 person team and we have 26 target HTTP implementations, we want to avoid target-specific source code modifications when possible. Thus, any isolation of parser code must be done with a reasonably general technique. The simplest thing is just sticking `__AFL_COVERAGE_ON()` and `__AFL_COVERAGE_OFF()` in signal handlers and fuzzing the servers over a Docker network with signals sent using `docker kill`. I've tested this and it works well on our simplest target server.

The next challenge is extracting the coverage table while the program is still running. Does AFL++ provide any support for live coverage extraction? If not, what do you estimate is the easiest way to achieve this? My current best guess is modifying `afl-showmap`.

Thanks,
Ben

Connor Shugg

unread,
Jul 7, 2023, 10:13:49 AM7/7/23
to afl-...@googlegroups.com
Hi Ben,

If speed isn't a problem, I don't think there's any glaring reason to desocket if you already have a good way to get the fuzzed payload to the target server. These libraries give you a direct way to dynamically load into the web server binary, which makes it easy to intercept stdin and route it through a network connection to yourself, but it sounds like you've got this part figured out.

Your 'docker kill' solution sounds like a good way to enable/disable instrumentation collection. Another idea, if you're interested: Each of the 26 different HTTP servers first must read the bytes from the socket using some system call (read(), recv(), etc.). Could you write an LD_PRELOAD library that overrides this system call (using dlsym()) and runs some extra code that, if the conditions are correct, calls __AFL_COVERAGE_ON()? You would also need to override some commonly-called system call after parsing, to run __AFL_COVERAGE_OFF(). This makes the server itself the one responsible for enabling/disabling its instrumentation, at a time it decides. This may be a little more accurate than delivering a process signal in terms of when the instrumentation collection should begin.

As for live coverage extraction, I'm not familiar with any features that do this, but someone else may be. Modifying afl-showmap.c looks like a good way to do this, although you may want to instead modify afl-fuzz itself to collect this for you after each test case execution.

Thanks,
Connor

Ben Kallus

unread,
Jul 10, 2023, 2:56:11 PM7/10/23
to afl-...@googlegroups.com
Thank you. That does sound more accurate, but we're okay introducing a little noise into the traces.

I've modified afl-showmap for this purpose in my afl++ fork, and it seems to work. If anyone has opinions on whether this feature should be merged upstream, please comment here: https://github.com/AFLplusplus/AFLplusplus/issues/1806

-Ben

You received this message because you are subscribed to a topic in the Google Groups "afl-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/afl-users/wafCrL_8zFQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to afl-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/afl-users/CANRTzN9e_3dsWTaKON4uttZ5Q%3D74_cFyZGaH7-O4Yi1rpr-PgQ%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages