Account Options

  1. Sign in
The old Google Groups will be going away soon.
Switch to the new Google Groups.
Google Groups Home
« Groups Home
Making C compiler generate obfuscated code
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  21 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Dennis Yurichev  
View profile  
 More options Dec 7 2010, 10:07 am
Newsgroups: comp.compilers
From: Dennis Yurichev <dennis.yuric...@gmail.com>
Date: Tue, 7 Dec 2010 07:07:12 -0800 (PST)
Local: Tues, Dec 7 2010 10:07 am
Subject: Making C compiler generate obfuscated code
Hi.

About my little attempt to hack Tiny C compiler's codegenerator so it
produces obfuscated code:
http://blogs.conus.info/node/58


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Paul Biggar  
View profile  
 More options Dec 9 2010, 12:53 pm
Newsgroups: comp.compilers
From: Paul Biggar <paul.big...@gmail.com>
Date: Thu, 9 Dec 2010 09:53:37 -0800
Local: Thurs, Dec 9 2010 12:53 pm
Subject: Re: Making C compiler generate obfuscated code
On Tue, Dec 7, 2010 at 7:07 AM, Dennis Yurichev

<dennis.yuric...@gmail.com> wrote:
> About my little attempt to hack Tiny C compiler's codegenerator so it
> produces obfuscated code:
> http://blogs.conus.info/node/58

I'd be interested to know how much of this added code could be removed
by a binary- or assembly-level optimizer, and it's affect on the
reversibility of result?
--
Paul Biggar
paul.big...@gmail.com

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Joshua Cranmer  
View profile  
 More options Dec 9 2010, 4:53 pm
Newsgroups: comp.compilers
From: Joshua Cranmer <Pidgeo...@gmail.com>
Date: Thu, 09 Dec 2010 16:53:59 -0500
Local: Thurs, Dec 9 2010 4:53 pm
Subject: Re: Making C compiler generate obfuscated code
On 12/07/2010 10:07 AM, Dennis Yurichev wrote:

> About my little attempt to hack Tiny C compiler's codegenerator so it
> produces obfuscated code:
> http://blogs.conus.info/node/58

From a decompilation aspect, it's not that difficult of obfuscated
code. Just by removing redundant assignments, knowing nothing else, I
produce this assembly:

a               proc near

var_CD500B      = byte ptr -0CD500Bh
arg_0           = dword ptr  8
arg_4           = dword ptr  0Ch
arg_1D364BDE    = byte ptr  1D364BE6h

                 nop
                 push    ebp
                 mov     ebp, esp
                 sub     esp, 0
                 nop

loc_800001F:
                 mov     ebx, 9EF81F3Eh
                 mov     eax, 0FD6D5D47h
                 sub     ebx, edx
                 mov     eax, [ebp+arg_4] ; *
                 shl     eax, 2          ; *
                 mov     ecx, [ebp+arg_0]
                 adc     ecx, ecx
                 mov     ecx, eax
                 or      ecx, eax
                 mov     ebx, 52A74759h
                 xor     edx, edx
                 jnz     short loc_800001F
                 mov     ecx, [ebp+arg_0] ; *
                 add     ecx, eax        ; *
                 mov     ebx, 0DAB5E429h
                 lea     edx, [ebx+75A1EF29h]
                 or      edx, edx
                 mov     eax, ecx        ; *
                 jmp     $+5
                 leave
                 pop     ebx
                 jmp     ebx
a               endp

Half the code is gone just by noticing that these values aren't being
used, with extremely liberal values of "used". If i notice that the jnz
branch is never taken, I get even shorter:
                 mov     eax, [ebp+arg_4] ; *
                 shl     eax, 2          ; *
                 mov     ecx, [ebp+arg_0] ; *
                 add     ecx, eax        ; *
                 mov     ebx, 0DAB5E429h
                 lea     edx, [ebx+75A1EF29h]
                 or      edx, edx
                 mov     eax, ecx        ; *

(I could combine the three obfuscated instructions into one, but I don't
want to do that by hand.)

If I wanted to deobfuscate this code, all I have to do is first run it
through an optimizer, and one simple enough to be written as a course
project at that. If you really want to obfuscate the code, it's better
to modify the control flow graph as opposed to inserting random
do-nothing code.

--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Torben Ęgidius Mogensen  
View profile  
 More options Dec 15 2010, 5:23 am
Newsgroups: comp.compilers
From: torb...@diku.dk (Torben Ęgidius Mogensen)
Date: Wed, 15 Dec 2010 11:23:49 +0100
Local: Wed, Dec 15 2010 5:23 am
Subject: Re: Making C compiler generate obfuscated code

Joshua Cranmer <Pidgeo...@gmail.com> writes:
> If I wanted to deobfuscate this code, all I have to do is first run it
> through an optimizer, and one simple enough to be written as a course
> project at that. If you really want to obfuscate the code, it's better
> to modify the control flow graph as opposed to inserting random
> do-nothing code.

Indeed.  I wsa not impressed by the approach: It made the code a lot
larger and a lot slower and the obfuscation was easy to remove.
Ideally, obfuscated code should only be marginally larger and slower
than "normal" code, but extremely difficult to reverse-engineer.

Some of the least readable code I have seen has been code where every
trick in the book was used to make it as short as possible, so using
extreme optimisaton tricks is probably a much better obfuscator than
inserting random code -- even random control-structure code.

Some common optimisations do, indeed, make the code less readable:
Strength reduction, loop scheduling, array blocking, common
subexpression elimination and so on.  So using these aggressively
would work well.  But there will be code where such optimsations do
not apply.  Here, you could use some of the following tricks:

 - Split a variable x into two variables p and q, such that x=p+q.
   Modifications to x are made into modifications of p _or_ q.  Uses of
   x (except those for self-modification such as x=x+1) are replaced by
   uses of p+q.  This will make the code somewhat larger and slower, but
   not by much.  Common subexpression elimination can remove some of the
   p+q additions without increasing readability.

 - Replace two otherwise unrelated variables x and y by p and q such
   that p=x+y and q=x-y (so x==(p+q)/2 and y==(p-q)/2).

 - A variable x holding small non-negative integers is replaced by a
   variable p = 2^x.  Additions and subtractions of constants to x are
   made into shifts.  Using x is more tricky, though, unlesss there is a
   "count leading zeroes" instruction.

        Torben


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
glen herrmannsfeldt  
View profile  
 More options Dec 16 2010, 3:52 am
Newsgroups: comp.compilers
From: glen herrmannsfeldt <g...@ugcs.caltech.edu>
Date: Thu, 16 Dec 2010 08:52:13 +0000 (UTC)
Local: Thurs, Dec 16 2010 3:52 am
Subject: Re: Making C compiler generate obfuscated code
Torben Fgidius Mogensen <torb...@diku.dk> wrote:
(snip on object code obfuscating compilers)

> Indeed.  I wsa not impressed by the approach: It made the code a lot
> larger and a lot slower and the obfuscation was easy to remove.
> Ideally, obfuscated code should only be marginally larger and slower
> than "normal" code, but extremely difficult to reverse-engineer.

It would seem that one could divide the code into blocks at
unconditional branches, and then move the blocks around randomly.
Though again, it might not be hard to unobfuscate.  I do remember
stories about compilers with optimization algorithms based on
random numbers and the natural difficulty with reverse engineering.

> Some of the least readable code I have seen has been code where every
> trick in the book was used to make it as short as possible, so using
> extreme optimisaton tricks is probably a much better obfuscator than
> inserting random code -- even random control-structure code.

That is, assuming such tricks are allowed.  As I understand it
(possibly incorrectly) the exception model in Java makes it
difficult to do many optimizations that traditionally would
have been done.  Moving statements outside of loops, for one.

> Some common optimisations do, indeed, make the code less readable:
> Strength reduction, loop scheduling, array blocking, common
> subexpression elimination and so on.  So using these aggressively
> would work well.  But there will be code where such optimsations do
> not apply.  Here, you could use some of the following tricks:

If they are predictable, though, it should be possible to write
a program to undo them.  Easier if the obfuscator is open source.

> - Split a variable x into two variables p and q, such that x=p+q.
>   Modifications to x are made into modifications of p _or_ q.  Uses of
>   x (except those for self-modification such as x=x+1) are replaced by
>   uses of p+q.  This will make the code somewhat larger and slower, but
>   not by much.  Common subexpression elimination can remove some of the
>   p+q additions without increasing readability.

Even better if you can find natural rearrangements of variables
that satisfy this goal.

> - Replace two otherwise unrelated variables x and y by p and q such
>   that p=x+y and q=x-y (so x==(p+q)/2 and y==(p-q)/2).

Probably not so bad for integers, but not good for floating point.

-- glen


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Joshua Cranmer  
View profile  
 More options Dec 16 2010, 10:23 am
Newsgroups: comp.compilers
From: Joshua Cranmer <Pidgeo...@gmail.com>
Date: Thu, 16 Dec 2010 10:23:45 -0500
Local: Thurs, Dec 16 2010 10:23 am
Subject: Re: Making C compiler generate obfuscated code
On 12/15/2010 05:23 AM, Torben Fgidius Mogensen wrote:

> Some of the least readable code I have seen has been code where every
> trick in the book was used to make it as short as possible, so using
> extreme optimisaton tricks is probably a much better obfuscator than
> inserting random code -- even random control-structure code.

I'm not sure what work has been done on deoptimization (perhaps anyone
with the Hex-Rays decompiler could tell us?), but some of the
optimization techniques seem relatively easy to reverse.

 From a purely theoretical standpoint, obfuscation that adds
non-executing code is going to be less difficult to reverse engineer
than obfuscation that does the same thing, just... less obviously.
--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth
[A good optimizer should make code much more obscure, since it throws
away everything not necessary to produce a correct result. Really good
optimizers like the ones in embedded linkers do gross tricks like combining
constants that happen to have the same bit pattern even though they mean
different things. -John]


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Joshua Cranmer  
View profile  
 More options Dec 16 2010, 7:45 pm
Newsgroups: comp.compilers
From: Joshua Cranmer <Pidgeo...@gmail.com>
Date: Thu, 16 Dec 2010 19:45:40 -0500
Local: Thurs, Dec 16 2010 7:45 pm
Subject: Re: Making C compiler generate obfuscated code
On 12/16/2010 03:52 AM, glen herrmannsfeldt wrote:

> Torben Fgidius Mogensen<torb...@diku.dk>  wrote:
> (snip on object code obfuscating compilers)

>> Indeed.  I wsa not impressed by the approach: It made the code a lot
>> larger and a lot slower and the obfuscation was easy to remove.
>> Ideally, obfuscated code should only be marginally larger and slower
>> than "normal" code, but extremely difficult to reverse-engineer.

> It would seem that one could divide the code into blocks at
> unconditional branches, and then move the blocks around randomly.
> Though again, it might not be hard to unobfuscate.  I do remember
> stories about compilers with optimization algorithms based on
> random numbers and the natural difficulty with reverse engineering.

My decompiler worked by turning the disassembled code into a graph
(also removing unconditional jumps in the process), so the actual
location of hunks did not make when whit of difference.

> That is, assuming such tricks are allowed.  As I understand it
> (possibly incorrectly) the exception model in Java makes it
> difficult to do many optimizations that traditionally would have
> been done.  Moving statements outside of loops, for one.

Java does make guarantees about apparent execution order, but to the
degree that you know that statements are not going to cause "side
effects", you can reorder stuff.

--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Martin Ward  
View profile  
 More options Dec 17 2010, 5:00 am
Newsgroups: comp.compilers
From: Martin Ward <mar...@gkc.org.uk>
Date: Fri, 17 Dec 2010 10:00:42 +0000
Local: Fri, Dec 17 2010 5:00 am
Subject: Re: Making C compiler generate obfuscated code
On Thursday 16 Dec 2010 at 15:23, Joshua Cranmer <Pidgeo...@gmail.com> wrote:

> I'm not sure what work has been done on deoptimization (perhaps anyone
> with the Hex-Rays decompiler could tell us?), but some of the
> optimization techniques seem relatively easy to reverse.

>  From a purely theoretical standpoint, obfuscation that adds
> non-executing code is going to be less difficult to reverse engineer
> than obfuscation that does the same thing, just... less obviously.

A major theoretical result in this area is the paper "On the
(Im)possibility of Obfuscating Programs" by Boaz Barak et al,
published in: CRYPTO '01 Proceedings of the 21st Annual International
Cryptology Conference on Advances in Cryptology. Boaz Barak gives a
non-technical description of the results and their meaning here:

http://www.math.ias.edu/~boaz/Papers/obf_informal.html

The FermaT program transformation system can interactively transform
code into semantically equivalent forms: including restructuring,
simplification, dataflow analysis , SSA construction, slicing and, in
simple cases, abstraction to a specification.  The FermaT engine and
graphical front end runs on Windows and Linux and can be downloaded
from here:

http://www.gkc.org.uk/fermat.html

or:

http://www.cse.dmu.ac.uk/~mward/fermat.html

FermaT's primary commercial application is migrating assembler to
structured and maintainable C and COBOL, so the "deobfuscation"
transformations are geared towards removing the "clever tricks"
introduced by assembler programmers to save a byte here and a cycle
there: or just because they were

--
                        Martin

STRL Reader in Software Engineering and Royal Society Industry Fellow
mar...@gkc.org.uk  http://www.cse.dmu.ac.uk/~mward/  Erdos number: 4
G.K.Chesterton web site: http://www.cse.dmu.ac.uk/~mward/gkc/
Mirrors:  http://www.gkc.org.uk  and  http://www.gkc.org.uk/gkc


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
glen herrmannsfeldt  
View profile  
 More options Dec 18 2010, 1:23 am
Newsgroups: comp.compilers
From: glen herrmannsfeldt <g...@ugcs.caltech.edu>
Date: Sat, 18 Dec 2010 06:23:36 +0000 (UTC)
Local: Sat, Dec 18 2010 1:23 am
Subject: Re: Making C compiler generate obfuscated code
Martin Ward <mar...@gkc.org.uk> wrote:

(snip)

> FermaT's primary commercial application is migrating assembler to
> structured and maintainable C and COBOL, so the "deobfuscation"
> transformations are geared towards removing the "clever tricks"
> introduced by assembler programmers to save a byte here and a cycle
> there: or just because they were

One of the tricks I remember from the 8 bit microprocessor days was to
put an instruction in the two bytes of a load instruction instead of a
branch around the instruction.

I used to have a table driven disassembler for 8080, Z80, and then
later 6809 code that would follow along from the entry point, keeping
track of conditional branch destinations on a stack.  When it reached
an unconditional branch it would take an entry off the stack.  It
keeps flags for each byte as the first byte (opcode), absolute
address, relative address, or, if not part of any instruction, data.
I suppose one would eventually learn enough tricks though.

Along the line of such tricks, there are stories of Apple II ROM
cloners claiming that so many such tricks were used that the Apple
implementation was the only one possible in the available number of
bytes.  Then, since it was the only one possible, it should not
protected against cloning.

-- glen


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rob Warnock  
View profile  
 More options Dec 18 2010, 5:30 am
Newsgroups: comp.compilers
From: r...@rpw3.org (Rob Warnock)
Date: Sat, 18 Dec 2010 04:30:50 -0600
Local: Sat, Dec 18 2010 5:30 am
Subject: Re: Making C compiler generate obfuscated code
Martin Ward  <mar...@gkc.org.uk> wrote:
+---------------
| A major theoretical result in this area is the paper "On the
| (Im)possibility of Obfuscating Programs" by Boaz Barak et al,
| published in: CRYPTO '01 Proceedings of the 21st Annual International
| Cryptology Conference on Advances in Cryptology. Boaz Barak gives a
| non-technical description of the results and their meaning here:
|
| http://www.math.ias.edu/~boaz/Papers/obf_informal.html
+---------------

See also more recent results mentioned here:

    http://www.wisdom.weizmann.ac.il/~oded/p_obfuscate.html

In particular, there's a 2010 revised version of the CRYPTO'01
full paper here (54 pages, 0.5 MB PDF):

    http://www.wisdom.weizmann.ac.il/~oded/PS/obf4.pdf

Some snippets from the abstract:

    Our main result is that, even under very weak formalizations
    of the above intuition, obfuscation is impossible.
    ...
    We extend our impossibility result in a number of ways,
    including even obfuscators that (a) are not necessarily
    computable in polynomial time, (b) only approximately
    preserve the functionality, and (c) only need to work
    for very restricted models of computation (TC0). We also
    rule out several potential applications of obfuscators,
    by constructing "unobfuscatable" signature schemes,
    encryption schemes, and pseudorandom function families.

-Rob

-----
Rob Warnock                     <r...@rpw3.org>
627 26th Avenue                 <URL:http://rpw3.org/>
San Mateo, CA 94403             (650)572-2607


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Hans-Peter Diettrich  
View profile  
 More options Dec 16 2010, 9:07 am
Newsgroups: comp.compilers
From: Hans-Peter Diettrich <DrDiettri...@aol.com>
Date: Thu, 16 Dec 2010 15:07:36 +0100
Local: Thurs, Dec 16 2010 9:07 am
Subject: Re: Making C compiler generate obfuscated code
Torben Fgidius Mogensen schrieb:

> Some of the least readable code I have seen has been code where every
> trick in the book was used to make it as short as possible, so using
> extreme optimisaton tricks is probably a much better obfuscator than
> inserting random code -- even random control-structure code.

Best obfuscation can be found in (older) Windows system code. Short
jumps have been replaced by dummy instructions (CMP), with other jumps
going to the argument of that instruction. What assembly code should a
disassembler generate from such ambiguous constructs?

Also nice are vararg-type subroutine parameters inlined after the call
statement, so that the location of the next statement after the call
is unknown. Even worse when multiple possible return adresses (jump
tables) are part of these parameters. Such obfuscation requires to
decode and emulate the called subroutine, before the location of the
next statement is known.

In practice such interruptions of the control flow make automatic
disassembling almost impossible. Instead a good *interactive*
disassembler is required (as I was writing when I came across above
tricks), and time consuming manual intervention and analysis is
required with almost every break in the control flow. The mix of data
and instructions not only makes it impossible to generate an assembler
listing, but also hides the use of memory locations (variables or
constants), with pointers embedded in the inlined parameter
blocks. Now tell me how a decompiler or other analysis tool should
deal with such constructs, when already the automatic separation of
code and data is impossible.

DoDi


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Torben Ęgidius Mogensen  
View profile  
 More options Dec 20 2010, 6:53 am
Newsgroups: comp.compilers
From: torb...@diku.dk (Torben Ęgidius Mogensen)
Date: Mon, 20 Dec 2010 12:53:51 +0100
Local: Mon, Dec 20 2010 6:53 am
Subject: Re: Making C compiler generate obfuscated code

Hans-Peter Diettrich <DrDiettri...@aol.com> writes:
> In practice such interruptions of the control flow make automatic
> disassembling almost impossible. Instead a good *interactive*
> disassembler is required (as I was writing when I came across above
> tricks), and time consuming manual intervention and analysis is
> required with almost every break in the control flow. The mix of data
> and instructions not only makes it impossible to generate an assembler
> listing, but also hides the use of memory locations (variables or
> constants), with pointers embedded in the inlined parameter
> blocks. Now tell me how a decompiler or other analysis tool should
> deal with such constructs, when already the automatic separation of
> code and data is impossible.

Using jump tables and the like is, indeed, going to make unobfuscation
hard.  Especially if the tables change dynamically.

You might be able to get around this by symbolic execution: You start
with a state description which allows arbitrary values of variables.
You then symbolically execute and update the state description as you go
along.  At any test, you split the state description into two new state
descriptions and continue symbolic execution on two individual paths.
Whenever symbolic execution reaches a previously-visited program point
with a state description equivalent to what was seen before, you make a
loop.  To keep the set of state descriptions finite, you apply
generalisation when a state description gets too complicated.

This process is similar to online partial evaluation, which also uses
state descriptions and generalisation.

That said, unobfuscation can never be perfect: Equivalence of programs
is undecidable, so it is in theory possible to make a program so
obfuscated that no automatic process can recover the original.

But what if you know the obfuscation method?  Assuming that the
obfuscation method is polynomic, deobfuscation is at worst NP-hard, so
it is decidable.  But it can be so intractable that it doesn't matter.

        Torben


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
George Neuner  
View profile  
 More options Dec 21 2010, 1:30 am
Newsgroups: comp.compilers
From: George Neuner <gneun...@comcast.net>
Date: Tue, 21 Dec 2010 01:30:32 -0500
Local: Tues, Dec 21 2010 1:30 am
Subject: Re: Making C compiler generate obfuscated code
On Sat, 18 Dec 2010 06:23:36 +0000 (UTC), glen herrmannsfeldt

<g...@ugcs.caltech.edu> wrote:
>Along the line of such tricks, there are stories of Apple II ROM
>cloners claiming that so many such tricks were used that the Apple
>implementation was the only one possible in the available number of
>bytes.  Then, since it was the only one possible, it should not
>protected against cloning.

I've heard these claims about the Apple II ROM for thirty years and I
think they are bogus.  I learned assembly programming on a IIe and it
was impossible to get much done without a good understanding of the
ROM routines.  Many subroutines had multiple entry points and many
were routinely reused from different contexts, but I don't believe the
Apple's ROM was any more complex than any other computer ROM of that
era (or indeed any 8-bit device today).

The major stumbling block to cloning the Apple IIe without licensing
the ROM code was Applesoft FP BASIC, which was implemented using a
bytecoded 16-bit virtual machine interpreter called Sweet16.
Approximately 5KB of Sweet16 code was included in the 16KB Apple IIe
ROM.  The cloners couldn't copy the Sweet16 code, nor could they
figure out how to fit a non-infringing work-alike BASIC into the same
address space.

There were a few other issues with the ROM code for the custom disk
controller and video (in particular the hi-res graphics IIRC).  Here
the problem was not having Wozniak's hardware ... work-alike code for
OTS video display and disk controllers couldn't be fit into the same
address space.

There eventually were a number of legal Apple IIe and //c clones, but,
AFAIK, all of them had much larger ROMs than the machine they copied
and had to use bank-switching and address shadowing to present a
compatible memory map for existing Apple programs.

YMMV but I don't consider implementing a virtual machine to be a
"trick".  Wozniak created Sweet16 to make programming easier ... it
ultimately saved some ROM space, but it was never intended to deceive
anyone ... the Sweet16 interpreter and the fact that it was used was
documented in the Apple ][ Integer BASIC manual.  I don't recall if
Sweet16 ever was mentioned in the FP BASIC manual, but FP BASIC was an
extension of Integer BASIC and so, ISTM, anyone interested should have
inferred that Sweet16 had been used in FP BASIC's implementation as
well.

George


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
George Neuner  
View profile  
 More options Dec 21 2010, 12:25 pm
Newsgroups: comp.compilers
From: George Neuner <gneun...@comcast.net>
Date: Tue, 21 Dec 2010 12:25:16 -0500
Local: Tues, Dec 21 2010 12:25 pm
Subject: Re: Making C compiler generate obfuscated code
On Mon, 20 Dec 2010 12:53:51 +0100, torb...@diku.dk (Torben Fgidius

Mogensen) wrote:
>Equivalence of programs is undecidable, ...

I'm not sure that's true ... at least in theory.  Turing equivalence
guarantees that any program can be expressed in the lambda calculus,
and the Church-Rosser theorem proves that two lambda expressions which
reduce to the same expression are equivalent.

In practice, though, I agree that deciding equivalence is intractable.

>... so it is in theory possible to make a program so
>obfuscated that no automatic process can recover the original.

Without meta information, such as exists in Java and .NET binaries, it
is already difficult to accurately reconstruct source for any but the
simplest of programs.

>But what if you know the obfuscation method?  Assuming that the
>obfuscation method is polynomic, deobfuscation is at worst NP-hard, so
>it is decidable.  But it can be so intractable that it doesn't matter.

I don't think knowing the method will help because any decent
implementation likely will have multiple ways to obfuscate any
particular construct and the selection will be psuedo-random.

George


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Walter Banks  
View profile  
 More options Dec 21 2010, 1:40 pm
Newsgroups: comp.compilers
From: Walter Banks <wal...@bytecraft.com>
Date: Tue, 21 Dec 2010 13:40:01 -0500
Local: Tues, Dec 21 2010 1:40 pm
Subject: Re: Making C compiler generate obfuscated code

George Neuner wrote:
> There were a few other issues with the ROM code for the custom disk
> controller and video (in particular the hi-res graphics IIRC).  Here
> the problem was not having Wozniak's hardware ... work-alike code for
> OTS video display and disk controllers couldn't be fit into the same
> address space.

Woz's disk controller was very clever. It used a simple state machine
to encode the data on and off the disk. At one point I wired up a
3 1/2 drive for my own use that read and write apple ][ disks (and
higher density that could not be read on an apple). At the time I
was impressed how well thought out Woz's design was. In my "one of"
I did something similar and used a 2708 eprom to store the jump
tables for a state machine.

> YMMV but I don't consider implementing a virtual machine to be a
> "trick".  Wozniak created Sweet16 to make programming easier ... it
> ultimately saved some ROM space, but it was never intended to deceive
> anyone ... the Sweet16 interpreter and the fact that it was used was
> documented in the Apple ][ Integer BASIC manual.  I don't recall if
> Sweet16 ever was mentioned in the FP BASIC manual, but FP BASIC was an
> extension of Integer BASIC and so, ISTM, anyone interested should have
> inferred that Sweet16 had been used in FP BASIC's implementation as
> well.

Woz wrote a Byte article documenting the Sweet16  code and use.
I don't think the FP basic used the Sweet 16 library. Sweet 16 was
used in the integer basic.

He also wrote a floating point package for the apple. At the time not all
6502's had an unconditional branch which meant that Woz used conditional
branches even when one was not needed. This single "feature" obfuscated
the sources almost to the point of almost being unreadable.

http://www.6502.org/source/floats/wozfp3.txt

All the best of the season,

Walter Banks
--
Byte Craft Limited
http://www.bytecraft.com
[Fascinating though this is, it doesn't have much to do with compilers
any more. -John]


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
glen herrmannsfeldt  
View profile  
 More options Dec 21 2010, 2:20 pm
Newsgroups: comp.compilers
From: glen herrmannsfeldt <g...@ugcs.caltech.edu>
Date: Tue, 21 Dec 2010 19:20:28 +0000 (UTC)
Local: Tues, Dec 21 2010 2:20 pm
Subject: Re: Making C compiler generate obfuscated code
Torben Fgidius Mogensen <torb...@diku.dk> wrote:
(snip)

> Using jump tables and the like is, indeed, going to make
> unobfuscation hard.  Especially if the tables change dynamically.

Yes for automated systems, but maybe not with a human in the loop.

If, for example, the target code is an interpreter then there is
likely a jump table for processing different statements.  Finding that
table, then, tells you more than finding a bunch of conditional
branches.

> You might be able to get around this by symbolic execution: You start
> with a state description which allows arbitrary values of variables.

Well, you do have to find the end of the table, which often isn't hard
done by hand.  (You can see when the addresses stop looking like
addresses.  Also, there might be a conditional test on the table
offset.)

(snip)

> This process is similar to online partial evaluation, which also uses
> state descriptions and generalisation.
> That said, unobfuscation can never be perfect: Equivalence of programs
> is undecidable, so it is in theory possible to make a program so
> obfuscated that no automatic process can recover the original.

Again, it helps to have a human in the loop, especially one who
know the art of the design of similar programs.

One of my first, larger, disassemblies (for personal use) was the
BASIC interpreter for the TRS-80 Color Computer, MC6809 code.
I was working on it for a while without finding the main loop
that reads statement codes and branches.  It turned out that such
loop is loaded into RAM at startup, and executes there.  When
disassembling ROM, one doesn't always expect execution in RAM.

Anyway, I believe that mostly automated but with a human in the
loop is the best way to do disassembly and deobfuscation.

-- glen


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Martin Ward  
View profile  
 More options Dec 22 2010, 6:30 am
Newsgroups: comp.compilers
From: Martin Ward <mar...@gkc.org.uk>
Date: Wed, 22 Dec 2010 11:30:48 +0000
Local: Wed, Dec 22 2010 6:30 am
Subject: Re: Making C compiler generate obfuscated code
On Tuesday 21 Dec 2010 at 17:25, George Neuner <gneun...@comcast.net> wrote:

> >Equivalence of programs is undecidable, ...

> I'm not sure that's true ... at least in theory.  Turing equivalence
> guarantees that any program can be expressed in the lambda calculus,
> and the Church-Rosser theorem proves that two lambda expressions which
> reduce to the same expression are equivalent.

Any program which takes no input and produces no output can only
do one of two things:

(a) Terminate.
(b) Not terminate.

So such a program is equivalent to either SKIP or ABORT.
If this equivalence were decidable, then termination would also be decidable.
But termination is undecidable (http://en.wikipedia.org/wiki/Halting_problem).

--
                        Martin

STRL Reader in Software Engineering and Royal Society Industry Fellow
mar...@gkc.org.uk  http://www.cse.dmu.ac.uk/~mward/  Erdos number: 4
[Termination is undecidable in general given an arbitrary program and
arbitrary input, but frequently decidable for a specific program and
specific input.  I'd think a program that takes no input would be
pretty easy to analyze. -John]


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Hans-Peter Diettrich  
View profile  
 More options Dec 22 2010, 11:12 am
Newsgroups: comp.compilers
From: Hans-Peter Diettrich <DrDiettri...@aol.com>
Date: Wed, 22 Dec 2010 17:12:06 +0100
Local: Wed, Dec 22 2010 11:12 am
Subject: Re: Making C compiler generate obfuscated code
Torben Fgidius Mogensen schrieb:

>> In practice such interruptions of the control flow make automatic
>> disassembling almost impossible. Instead a good *interactive*
>> disassembler is required (as I was writing when I came across above
>> tricks), and time consuming manual intervention and analysis is
>> required with almost every break in the control flow. The mix of data
>> and instructions not only makes it impossible to generate an assembler
>> listing, but also hides the use of memory locations (variables or
>> constants), with pointers embedded in the inlined parameter
>> blocks. Now tell me how a decompiler or other analysis tool should
>> deal with such constructs, when already the automatic separation of
>> code and data is impossible.

> Using jump tables and the like is, indeed, going to make unobfuscation
> hard.  Especially if the tables change dynamically.

In the observed cases the presence of jump tables was unknown, and also
the structure and size of the data block, that follows the call
instruction :-(

> You might be able to get around this by symbolic execution: You start
> with a state description which allows arbitrary values of variables.

Then you'll end up with a tree of states, in the best case, and a graph
(with loops and knots) in the worst case.

> But what if you know the obfuscation method?  Assuming that the
> obfuscation method is polynomic, deobfuscation is at worst NP-hard, so
> it is decidable.  But it can be so intractable that it doesn't matter.

It may be possible to crack algorithmic obfuscation, but odds are bad
with the encountered "handmade" obfuscation. The intended (and achieved)
effect was optimization (almost for smaller size), and the resulting
obfuscation only was a side effect.

Even if one can produce equivalent assembler code, with some tricks
(macros...) for data structures with multiple meanings (instructions in
instruction arguments...), that code will remain hard to understand -
and that's the primary goal of every obfuscation. Who will be able to
tell the *purpose* of a state machine or other automaton, given only its
implementation?

More unobfuscation problems come into mind, like the use of modified
external code, maybe only different versions of (shared) standard
libraries. When some code relies on the values returned from such
external subroutines, and the precise implementation in a specific
library version, the entire environment (version of the OS and all
installed libraries) has to be taken into account.

DoDi


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
George Neuner  
View profile  
 More options Dec 23 2010, 4:08 pm
Newsgroups: comp.compilers
From: George Neuner <gneun...@comcast.net>
Date: Thu, 23 Dec 2010 16:08:06 -0500
Local: Thurs, Dec 23 2010 4:08 pm
Subject: Re: Making C compiler generate obfuscated code
On Wed, 22 Dec 2010 11:30:48 +0000, Martin Ward <mar...@gkc.org.uk>
wrote:

It is true that termination, in general, is undecidable ... but that
doesn't invalidate Church-Rosser equivalence.

The lambda calculus is based on term rewriting.  Church-Rosser
equivalence is based on pattern matching of lambda terms.  By applying
the rewrite rules for term expansion, substitution and reduction, if
two different terms can be rewritten to be in the same pattern form -
modulo variable names - then they are considered equivalent.  It
doesn't matter what the terms "mean" or what they might do if
"executed".

"Execution" in LC is a process of continually applying the rewrite
rules to the "program" term until no more reductions can be found (a
halt) or until a fixed point (a cycle) is observed.

George


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Torben Ęgidius Mogensen  
View profile  
 More options Jan 4 2011, 6:19 am
Newsgroups: comp.compilers
From: torb...@diku.dk (Torben Ęgidius Mogensen)
Date: Tue, 04 Jan 2011 12:19:18 +0100
Local: Tues, Jan 4 2011 6:19 am
Subject: Re: Making C compiler generate obfuscated code

George Neuner <gneun...@comcast.net> writes:
> It is true that termination, in general, is undecidable ... but that
> doesn't invalidate Church-Rosser equivalence.

> The lambda calculus is based on term rewriting.  Church-Rosser
> equivalence is based on pattern matching of lambda terms.  By applying
> the rewrite rules for term expansion, substitution and reduction, if
> two different terms can be rewritten to be in the same pattern form -
> modulo variable names - then they are considered equivalent.  It
> doesn't matter what the terms "mean" or what they might do if
> "executed".

There is no conflict here: Equivalence of lambda terms is undecidable.

Equivalence (more specifically beta equivalence), as you say, is defined
by saying that two terms t1 and t2 are equivalent if there is a term t3
such that t1 => t3 and t2 => t3, where => is the (reflective and
transitive) reduction relation.  But if t1 or t2 both are without normal
forms, you can not decide their equivalence by comparing a finite number
of terms.  Hence, equivalence is only semi-decidable.

There are various other ways to define equivalence of lambda terms
that attempt to equivalate more terms, such as all terms that have no
head normal form or weak head normal form.  But these are also
undecidable.  You can define equivalences that equate fewer terms and
which are decidable, but these are not useful as program equivalence.
Alpha equivalence and eta equivalence springs to mind.

        Torben


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
George Neuner  
View profile  
 More options Jan 6 2011, 3:45 pm
Newsgroups: comp.compilers
From: George Neuner <gneun...@comcast.net>
Date: Thu, 06 Jan 2011 15:45:23 -0500
Local: Thurs, Jan 6 2011 3:45 pm
Subject: Re: Making C compiler generate obfuscated code
On Tue, 04 Jan 2011 12:19:18 +0100, torb...@diku.dk (Torben Fgidius

I think it depends on which calculus you're working in.

>Equivalence (more specifically beta equivalence), as you say, is defined
>by saying that two terms t1 and t2 are equivalent if there is a term t3
>such that t1 => t3 and t2 => t3, where => is the (reflective and
>transitive) reduction relation.  But if t1 or t2 both are without normal
>forms, you can not decide their equivalence by comparing a finite number
>of terms.  Hence, equivalence is only semi-decidable.

Actually I'm talking about alpha equivalence: e.g.,  \x.x <==> \y.y
Identical form modulo variable names.

>There are various other ways to define equivalence of lambda terms
>that attempt to equivalate more terms, such as all terms that have no
>head normal form or weak head normal form.  But these are also
>undecidable.  You can define equivalences that equate fewer terms and
>which are decidable, but these are not useful as program equivalence.
>Alpha equivalence and eta equivalence springs to mind.

AFAIK, for the _untyped_ lambda calculus, head normal forms are not an
issue and you can prove or disprove alpha equivalence of forms by
recursive substitution and applicative order reduction until no more
substitutions or reductions are possible and then examining the
resultant forms.

Happy to be proven wrong though.  References?
George


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »