LLVM IR Without a CFG

186 views
Skip to first unread message

steven.raf...@gmail.com

unread,
Aug 27, 2014, 11:31:05 AM8/27/14
to mcsem...@googlegroups.com
The current pipeline seems to make a lot sense if I want to produced LLVM IR from a CFG that I get through some external analysis. But to perform better CFG recovery I need the instruction semantics. What functions are provided in mcsema if I need to take the raw output of a decoder and retrieve LLVM IR. I tried reading through files in https://github.com/turnersr/mcsema/blob/master/mc-sema/cfgToLLVM and in particular https://github.com/turnersr/mcsema/blob/master/mc-sema/cfgToLLVM/raiseX86.cpp but it was not clear to me how I could incrementally produce expresses and statements in the IR if I just had sequence of bytes representing code. 




steven.raf...@gmail.com

unread,
Aug 27, 2014, 5:32:26 PM8/27/14
to mcsem...@googlegroups.com, steven.raf...@gmail.com
Just to be clearer, I would like to use the simple MC layer as a foundation for more sophisticated CFG recovery. One idea for CFG recovery that mcsema is currently designed for would be to first bootstrap with recursive descent to get a LLVM module and then do alias analysis via symbolic/concert execution or abstract interpretation from the LLVM representation and then regenerate the object file and run recursive descent again and repeat. However, this assumes that the initial recursive descent strategy is correct. It would be much better to use the semantics of the instructions to inform the method used by the dissembler rather than start out with a degenerate and syntactically derived representation and then attempt to patch it up. 

Thanks,
Rafael 

endeavor

unread,
Aug 28, 2014, 12:12:59 AM8/28/14
to mcsem...@googlegroups.com
There are not enough upvotes in the world for this. I believe the single-most versatile function that could derive from this framework would be one where the sole argument was a series of bytes in a given architecture, and the result was LLVM IR. Third parties can write the loader, the memory model, the logic, but this project, with traction, stands to make its contribution with a decently-performing (which would be well/good/better enough) native-to-IL translator in an imperative language for the masses.


--
You received this message because you are subscribed to the Google Groups "mcsema-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mcsema-dev+...@googlegroups.com.
To post to this group, send email to mcsem...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mcsema-dev/38c3840e-3e8e-4b91-92e8-479ad13f7463%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages