[ANN] Tavor - A generic fuzzing and delta-debugging framework

552 views
Skip to first unread message

Markus Zimmermann

unread,
Jan 25, 2015, 6:11:16 AM1/25/15
to golan...@googlegroups.com
Hi gophers!

I'd like to announce Tavor https://github.com/zimmski/tavor

Tavor is a framework for easily implementing and using fuzzing and delta-debugging. Its EBNF-like notation allows you to define file formats, protocols, and other structured data without the need to write source code. Tavor relaxes on the definitions of fuzzing and delta-debugging to enable the usage of its algorithms universally for key-driven testing, model-based testing, simulating user-behavior and genetic programming. Tavor is also well-suited for researching new methods without reimplementing basic algorithms.

Basically, if you want to test something which has structural data you want to automatically generate that data. Doing it manually is a waste of time. Also if you want to reduce a test-case to its bare minimum you want to do that automatically. In both cases you can use Tavor.

Although the project is only at an early stage, I am using Tavor a lot for my own projects, at work and brought it already into other companies where it is successfully utilized. It is currently mostly used as a file format fuzzer, for key-driven testing and for model-based testing, and it has found literally **hundreds** of bugs. The project has some known problems [https://github.com/zimmski/tavor/labels/bug] but its basic features are stable and as mentioned used in production.

The whole reason for the project is that I saw how powerful fuzzing and delta-debugging can be for testing and for developers in general. However, I also saw that only a few people utilize these techniques because it is too time consuming to implement them. I want to change that with Tavor. My personal milestone is to reach a functionality equality with https://github.com/OpenRCE/sulley but with easier and shorter definitions for the fuzzers. I am also thinking about implementing white box fuzzing such as dynamic symbolic/concolic execution but there is still a lot that has to be done before that.


I hope you find this project useful and I appreciate any suggestions and comments no matter if it is about the project, code or documentation. I am also looking for people who want to join the project or want to create fuzzers for their own projects. In any case I am happy to help out!

Cheers,
Markus

Markus Zimmermann

unread,
Jan 30, 2015, 7:59:04 PM1/30/15
to golan...@googlegroups.com
I received some feedback that it would be nice to present a short example on how you can use Tavor https://github.com/zimmski/tavor. There is an example at https://github.com/zimmski/tavor/blob/master/doc/complete-example.md which shows how a key-driven approach based on a state machine can be implemented, tested and debugged. It is still a little verbose, so I will give a shorter example for a file format fuzzer.

Imagine that we want to test a command line application which reads in a CSV file with three columns given as an argument.
- "name": Is a string starting with an uppercase letter followed by at least one and at maximum 9 lowercase letters
- "value": Is a number from 0 to 255
- "type": Is "true" or "false"
Testing this program seems very easy. We can immediately write a table-driven test and then we could write some test inputs manually. Sure, this would lead to a nice coverage but we would have to think and type a lot, and I can guarantee that some corner cases will be missed by some testers. Furthermore, what if we add a fourth column? All test inputs have to be rewritten. These are serious problems and we can make them go away by using Tavor to generate the test inputs.

First we define the format of the CSV file using the Tavor format https://github.com/zimmski/tavor/blob/master/doc/format.md

```
START = Columns *(Row)

Delimiter     = ","
LineDelimiter = "\n"

Columns = "name" Delimiter "value" Delimiter "type" LineDelimiter
Row     =  Name  Delimiter  Value  Delimiter  Type  LineDelimiter

Name       = [A-Z] +1,9([a-z])
$Value Int = from: 0,
             to:   255
Type       = "true" | "false"
```


Let's save this file as "csv.tavor". We can now use Tavor to generate test inputs:

tavor --format-file csv.tavor fuzz

This command will generate on each call a random test input such as:

```
name,value,type
```

```
name,value,type
Gbs,221,true
```

```
name,value,type
Tjedr,179,false
Jirrqxyl,120,true
```


We could write the generated CSV data to files and then test our application or we could just use Tavor to test it directly:

tavor --format-file csv.tavor fuzz --exec "app TAVOR_FUZZ_FILE" --exec-argument-type argument --exec-exact-exit-code 0 --result-folder .

This command will generate on each call a random test input, execute the application "app" with it and verify that the exit code is 0. If it is not 0 it is an error and the test input will be saved in the current folder. We can now run this command in a loop to test more than once. We can also use a different fuzzing strategy with the --strategy option so the test inputs are generated in a smarter way than just pure random.

Another example where Tavor is extremely useful is reducing data. Imagine that we have a CSV file with hundreds of rows which lets our application crash with another exit code. We could use the file directly and debug the application or we could use Tavor first to reduce the file to a smaller version where the crash is still present. Maybe it is just a single row of the hundreds of rows that crashes the application. Reducing the file first, will help us debug and save time. Using our format and the file "crash.csv", which lets our application crash, we can execute the following command:

tavor --format-file csv.tavor reduce --input-file crash.csv --exec "app TAVOR_DD_FILE" --exec-argument-type argument --exec-exact-exit-code

Tavor will then reduce the rows and even the names until only the data is left which crashes the application.


I hope you liked this example and can incorporate Tavor into your own projects. If you have any questions just send me a mail or ask here in this thread. I am always happy to help out!

Cheers,
Markus

anl...@gmail.com

unread,
Jan 30, 2015, 8:19:34 PM1/30/15
to golan...@googlegroups.com
are there any plans to add other debugging techniques?

In particular
*symbolic execution (like klee) to generate a buggy test input. I don't know the status of llgo project.
* fuzzing transitions of isomorphic automata to find a short divergent test input
* a tool to generate a testvector generator given a code api.

Sean Russell

unread,
Feb 1, 2015, 7:47:19 AM2/1/15
to golan...@googlegroups.com
Very nice, thanks!

--- SER

Markus Zimmermann

unread,
Feb 3, 2015, 5:44:46 AM2/3/15
to golan...@googlegroups.com, anl...@gmail.com
On Saturday, January 31, 2015 at 2:19:34 AM UTC+1, anl...@gmail.com wrote:

are there any plans to add other debugging techniques?

Yes. You can find some of my current ideas/wishes in the project's tracker at https://github.com/zimmski/tavor/issues If you think that something interesting is missing, please just add it.


In particular
*symbolic execution (like klee) to generate a buggy test input. I don't know the status of llgo project.

Symbolic execution is on my list. However, I want to go one step further to make the implementation more useful for the general developer. I would like to implement dynamic symbolic execution (concolic execution) like Pex [https://pexase.codeplex.com/] does but in a generic way. You can find the issue here https://github.com/zimmski/tavor/issues/76 Go would then be the first test subject. But there is still a lot that needs to be implemented. For instance I do not know any instrumentation packages for Go. That would be the first step for implementing this. And please note, I do not intend to create a new constraint solver. I think this should be handled externally.

Generating buggy test inputs is one of my top TODO items. In principle, only a new fuzzing filter/fuzzing search strategy is needed. I already implemented a simple one http://godoc.org/github.com/zimmski/tavor/fuzz/filter#NegativeBoundaryValueAnalysisFilter Another I wish to implement in the coming days is to simply omit required tokens in a structured way. I also intend to implement mutation techniques like I did here https://github.com/zimmski/go-mutesting

Isn't llgo now an offical part of LLVM?


* fuzzing transitions of isomorphic automata to find a short divergent test input

You mean in the sense of model checking? Check if a model meets its specification without executing the implementation? If so, no. However this would be interesting but I do not know how to currently incorporate this into Tavor since it is still a black-box tool and can handle only one definition at a time.

    
* a tool to generate a testvector generator given a code api.
    
This is on my list but does not have a high priority for the public repository. I cannot think of a generic way to do this. If you know a fine paper for this topic or have any ideas please let me know. I implemented some generators at work but I am not that happy with the implementations and they are currently closed source. However, there are two things on my list I hope to tackle this month. Generators for CLI commands (generating calls to execute CLI programs with their options and arguments) and a helper to test HTTP REST services.


In general my current focus is extending the Tavor format https://github.com/zimmski/tavor/blob/master/doc/format.md and make Tavor as bug-free as possible. Allowing people to define how their data looks like is in my opinion more important. Implementing smart heuristics for better model-coverage and fewer generations to exploit the definitions can be done later.

Markus Zimmermann

unread,
Oct 8, 2015, 5:51:48 PM10/8/15
to golang-nuts
In case someone is subscripted to this thread: I announced a new version of Tavor + a short report on experiences with users https://groups.google.com/forum/#!topic/golang-nuts/Me8ELFyIqj8
Reply all
Reply to author
Forward
0 new messages