Message from discussion Making C compiler generate obfuscated code
From: Hans-Peter Diettrich <DrDiettri...@aol.com>
Subject: Re: Making C compiler generate obfuscated code
Date: Wed, 22 Dec 2010 17:12:06 +0100
Organization: Compilers Central
References: <firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org>
X-Trace: gal.iecc.com 1293120795 82252 126.96.36.199 (23 Dec 2010 16:13:15 GMT)
NNTP-Posting-Date: Thu, 23 Dec 2010 16:13:15 +0000 (UTC)
Keywords: code, design
Posted-Date: 23 Dec 2010 11:13:15 EST
Torben Fgidius Mogensen schrieb:
>> In practice such interruptions of the control flow make automatic
>> disassembling almost impossible. Instead a good *interactive*
>> disassembler is required (as I was writing when I came across above
>> tricks), and time consuming manual intervention and analysis is
>> required with almost every break in the control flow. The mix of data
>> and instructions not only makes it impossible to generate an assembler
>> listing, but also hides the use of memory locations (variables or
>> constants), with pointers embedded in the inlined parameter
>> blocks. Now tell me how a decompiler or other analysis tool should
>> deal with such constructs, when already the automatic separation of
>> code and data is impossible.
> Using jump tables and the like is, indeed, going to make unobfuscation
> hard. Especially if the tables change dynamically.
In the observed cases the presence of jump tables was unknown, and also
the structure and size of the data block, that follows the call
> You might be able to get around this by symbolic execution: You start
> with a state description which allows arbitrary values of variables.
Then you'll end up with a tree of states, in the best case, and a graph
(with loops and knots) in the worst case.
> But what if you know the obfuscation method? Assuming that the
> obfuscation method is polynomic, deobfuscation is at worst NP-hard, so
> it is decidable. But it can be so intractable that it doesn't matter.
It may be possible to crack algorithmic obfuscation, but odds are bad
with the encountered "handmade" obfuscation. The intended (and achieved)
effect was optimization (almost for smaller size), and the resulting
obfuscation only was a side effect.
Even if one can produce equivalent assembler code, with some tricks
(macros...) for data structures with multiple meanings (instructions in
instruction arguments...), that code will remain hard to understand -
and that's the primary goal of every obfuscation. Who will be able to
tell the *purpose* of a state machine or other automaton, given only its
More unobfuscation problems come into mind, like the use of modified
external code, maybe only different versions of (shared) standard
libraries. When some code relies on the values returned from such
external subroutines, and the precise implementation in a specific
library version, the entire environment (version of the OS and all
installed libraries) has to be taken into account.