GF> Hi,
GF> I am trying to generate pdg/ddg for a very simple program:
GF> #include <stdio.h>
GF> int main(int argc, char *argv[]){
GF> int j = argc + 1;
GF> printf ("hello world!! %d \n", j);
GF> }
GF> $ gcc -o test3 test3.c
GF> $ gcc -v
GF> -> gcc: gcc version 4.4.1 (Ubuntu 4.4.1-4ubuntu9)
GF> $file test3:
GF> -> test3: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
GF> dynamically linked (uses shared libs), for GNU/Linux 2.6.15, not
GF> stripped
GF> But ddg/pdg/cfgsimp construction fails -
GF> -ddg Compute the Data Dependence Graph and output a .dot file.
GF> vine-1.0$ utils/irtrans -ddg t3.dot -frombin test3
GF> snip...already explained warnings (segment warning etc.)
GF> WARNING (CFG): CFG contains BB_Indirect
GF> Fatal error: exception Failure("CFG not well defined")
The last two lines of the output there actually go together. For the
purposes of analyses like converting to SSA, Vine defines a CFG to be
"well-defined" if the following 4 conditions hold:
1. A BB_Entry node exists.
2. A BB_Exit node exists.
3. No BB_Indirect node exists.
4. Every node is reachable from the BB_Entry.
The program is aborting because the CFG does not satisfy condition 3.
The problem is that Vine's SSA conversion and CFG simplification
(which are pre-requisites to the other kinds of analysis you're
trying) don't work in the presence of unresolved indirect jumps,
because there's no way for Vine to figure out statically where they
jump to. You can see where these are by looking for the CFG nodes that
go to BB_Indirect in the unsimplified CFG. In my version of the
binary, they are:
0x80482cb: ret
0x80482d2: jmp *0x80495b8
0x80482dc: jmp *0x80495bc
0x80482ec: jmp *0x80495c0
0x80482fc: jmp *0x80495c4
0x8048378: call *0x80494d0(,%eax,4)
0x8048394: ret
0x80483bf: call *%eax
0x80483c2: ret
0x80483ed: ret
0x80483f4: ret
0x8048444: call *-0xe8(%ebx,%esi,4)
0x8048459: ret
0x804845d: ret
0x804847b: call *%eax
0x8048489: ret
0x80484a7: ret
Part of the issue here is that when you use irtrans -frombin, it
disassembles every instruction it can find anywhere in the binary. For
a small program like this the hidden machinery that runs at program
startup and end dominates the actual main function; all of the returns
other than 0x80483ed above come from other functions. The second
through fifth lines are from the PLT; 0x80494d0 is the cleanups array
__DTOR_LIST__, and so on.
If you're actually only interested in the main() function, you'll have
better luck if remove the other instructions from the IR by saving it
in a text file, finding the starting and ending addresses of main()
from a disassembler, and then remove the other IR blocks. You'll also
want to comment out both the specials and the jumps associated with
the call and return instructions, since otherwise the call will be an
error and the return will cause a BB_Indirect. (Alternatively you can
include the instructions of the called instruction as well, leave the
direct jump for the call, and change the return into a direct jump to
the return point; this is equivalent to inlining the call.)
GF> a) Is this the correct way of using pdg/ddg/cfgsimp options? if
GF> so, is the problem known or how to fix it? I have not yet started
GF> looking at the ml files but common error messages seem to indicate
GF> towards CFG construction (vine_cfg.ml?) or probably a problem in
GF> x86 to IR translation (??).
Basically the issue here is that the kind of x86 to IR translation
performed by -frombin is not quite the same kind that the static
analysis passes like SSA are intended to work with. They're intended
to be used with an IR where binary-level functions, calls, and returns
have already been translated into Vine functions, calls, and returns,
and indirect jumps have been resolved. For the function-related
transformations, our older code for this wasn't included in the Vine
1.0 release. Resolving indirect jumps is in general a hard problem;
function returns are an easier special case, but we'd don't have any
general solution that works statically.
GF> b) I would like to understand the code for irtrans (is it part of
GF> VINE/ TEMU releases?). If so, any starting points or pointers to
GF> already existing documentation would be greatly appreciated.
The code for the irtrans program is included as utils/irtrans.ml in
the Vine source code release. It's basically a wrapper than calls a
bunch of other transformations defined in modules in the ocaml/
directory. For instance if you look for "cfgsimp" in the irtrans
source, you'll see that it works by calling Vine_cfg.trace_to_cfg
followed by Vine_dataflow.simplify_graph (in ocaml/vine_dataflow.ml),
and the latter checks that the CFG is well defined as the first step.
Some of these routines have OCamlDoc documentation, but you'll
probably want to read it in conjunction with the source code.
Hope this helps,
-- Stephen