Feature request: single-file fuzzer dictionaries

Jakub Wilk

unread,

May 2, 2015, 4:31:08 PM5/2/15

to afl-...@googlegroups.com

To create a fuzzer dictionary, it is currently required to put each
token to a separate file. I find it rather cumbersome. Could afl-fuzz
have an option to read the token list from a single file?

--
Jakub Wilk

Michal Zalewski

unread,

May 2, 2015, 5:19:06 PM5/2/15

to afl-users

> To create a fuzzer dictionary, it is currently required to put each token to
> a separate file. I find it rather cumbersome. Could afl-fuzz have an option
> to read the token list from a single file?

I didn't do it mostly because you'd need an escaping scheme for
control characters, which isn't necessarily more convenient for binary
formats, and there would be other common ambiguities that crop up when
you author config files in text editors - did the user mean that
literal ^M or did they just edit the dictionary file on Windows? did
they mean to include that trailing whitespace or not?

Any ideas on how to solve that neatly?

/mz

Ben Nagy

unread,

May 2, 2015, 7:32:39 PM5/2/15

to afl-...@googlegroups.com

On Sun, May 3, 2015 at 9:18 AM, Michal Zalewski <lca...@gmail.com> wrote:
> Any ideas on how to solve that neatly?

JSON?

*ducks*

Michal Zalewski

unread,

May 2, 2015, 7:52:45 PM5/2/15

to afl-users

> JSON?

XML... or...

http://www-01.ibm.com/support/knowledgecenter/SS9H2Y_6.0.0/com.ibm.dp.xm.doc/json_jsonx.html

/mz

Ben Nagy

unread,

May 3, 2015, 5:46:16 AM5/3/15

to afl-...@googlegroups.com

On Sun, May 3, 2015 at 11:52 AM, Michal Zalewski <lca...@gmail.com> wrote:
>> JSON?
>
> XML... or...

All of the _pro_ fuzzing frameworks use xml, bro.

Jakub Wilk

unread,

May 4, 2015, 3:34:50 PM5/4/15

to afl-...@googlegroups.com

* Michal Zalewski <lca...@gmail.com>, 2015-05-02, 14:18:

Well, but you have the same problem with one-token-per-file scheme. For
example, tokens in the HTML dictionary have trailing newlines, which is
likely unintentional. (Although I guess it doesn't hurt much either...)

>Any ideas on how to solve that neatly?

I have two different proposals:

1) Just split the dictionary file using "\n" as the separator. No
escaping scheme, no special treatment of control characters. People who
need "\n" in their tokens will have to use the old one-token-per-file
scheme. Windows users will have to get a better OS.

2) Split the dictionary file using "\n" as the separator. Strip trailing
whitespace from each line. If the line starts and ends with double-quote
characters, treat it as a C string; otherwise treat it as a raw string.

--
Jakub Wilk

Michal Zalewski

unread,

May 4, 2015, 3:51:33 PM5/4/15

to afl-users

I like the quoted C-string idea, it lets us keep some metadata in the
form of name = value pairs.

I'm not super-convinced that escaping binary tokens is a lot easier
than a for loop to split text tokens into files, but don't care
strongly. Will put something together once I fix the deduping code
(see the other thread).

/mz

Michal Zalewski

unread,

May 5, 2015, 2:28:50 AM5/5/15

to afl-users

This didn't make it into 1.76b, but is coming soon. My current plan is
to do two things:

1) If -x points to a directory, no changes compared to current behavior.

2) If -x points to a file, we accept a file in the format of:

ignored_string="hex_escaped_keyword"

...with newlines and lines starting with # ignored and unescaped
control chars not tolerated. I'm slightly tempted to make it more
expressive (for example, allow dictionaries to have several coverage
levels to choose from), but not sure if anyone would have any use for
it. Thoughts?

/mz

Ben Nagy

unread,

May 5, 2015, 5:01:13 AM5/5/15

to afl-...@googlegroups.com

On Tue, May 5, 2015 at 6:28 PM, Michal Zalewski <lca...@gmail.com> wrote:
>
> ignored_string="hex_escaped_keyword"
>
> ...with newlines and lines starting with # ignored and unescaped
> control chars not tolerated.

Uh.. could you provide an actual example of valid syntax? I'm not 100%
sure I understand your proposed format - this is not a standard format
of any kind, right?

> I'm slightly tempted to make it more
> expressive (for example, allow dictionaries to have several coverage
> levels to choose from), but not sure if anyone would have any use for
> it. Thoughts?

I am definitely guilty of having too many tokens, but I am not
convinced that letting me choose which ones get used and how often is
going to be good, I think it will just lead to me making decisions
about stuff that the fuzzer should probably be working out for me ( or
leaving it to chance ). Just my 0.02...

Cheers,

ben

Jakub Wilk

unread,

May 5, 2015, 8:31:19 AM5/5/15

to afl-...@googlegroups.com

* Michal Zalewski <lca...@gmail.com>, 2015-05-04, 23:28:

>2) If -x points to a file, we accept a file in the format of:
>
>ignored_string="hex_escaped_keyword"

Is the ignored_string= part going to be obligatory? (I hope it won't.)

--
Jakub Wilk

Michal Zalewski

unread,

May 5, 2015, 8:47:18 PM5/5/15

to afl-users

See 1.77b. You can pass files to -x. The file can look like this:

# ignored comment
# ignored comment

tag_iframe="<iframe>"
attr_src = " src="
gibberish = "\x0A\x0D\x00\"\\"
"you can also omit names if you really wanna"

Only \xNN, \", and \\ is supported for escaping values. Quotes are
mandatory to prevent snafus and whitespace / newline issues - I think
this is a lot more bulletproof than picking some interpretation
randomly and hoping for the best. Stray control or high-bit characters
are not allowed. The names are ignored, but are meant to provide some
bookkeeping information for the user, akin to filenames for directory
dictionaries.

I was thinking of allowing a subset of the dictionary to be selected
based on some meta-information attached to names, e.g. -x some_file@1
means selecting all names that end with @1, etc, but currently don't
see that many use cases.

PS. There is no detection of dupes.

/mz

Reply all

Reply to author

Forward