Is there any untility out there in the market that would convert z80
or 8086 code to C?
Thanks
I'd have to say impossible.
Or at very least, improbable, cause there are things that C cannot
represent (efficiently) that asm can.
Percival
In the general case, you are correct, but there have been some efforts
(more or less successful) to automatically reverse engineer code that
had *originally* been written in C. Specifically, I recall Cristina
Cifuentes' dcc, written in the early 1990's. See
http://www.itee.uq.edu.au/~cristina/dcc.html for details.
Ed
gcc -S myfile.c
outputs "myfile.S" which is a gas assembly translation of that C source.
Percival
Its known as a "decompiler". Its a myth. The idea that you can automatically
reverse engineer code is not reasonable. Even if you are able to recognize a
few basic contructs in the assembly, such as loops, there is always going
to be code that your recogizer does not understand.
The question about decompilers comes up fairly often. Just as often, some
posters have insisted there were workable decompilers available. A few years
ago, I decided to look for one myself. I invited the "decompiler avocates"
to send me tips on where to find decompilers, and searched extensively myself.
I wanted to see if anyone got even %50 of the code right.
I found several programs that claimed to be decompilers, including one
written as part of a masters thesis from a reputable institution.
What I found was that virtually all of them crashed or produced no reasonable
result on the dozens of programs I fed them. I gave them programs ranging from
GCC compiled simple code, to windows or dos programs, etc. In one case, the
"masters thesis" program, the decompile worked only on the program the author
had supplied (despite the authors own assertions that the program "should
decompile everything").
The myth of a decompiler has its basis in a factual product that was commonly
used to "decompile" Basic programs. What these programs actually did was
decode the internal form for the programs. It worked because the "compiled"
basic was not actually compiled, but rather changed to an internal form,
sometimes a trivial encoding. Thus, for example, Visual Basic might be
"decompiled" because it was never really compiled. Visual Basic is not a true
compiler.
--
Samiam is Scott A. Moore
Personal web site: http:/www.moorecad.com/scott
My electronics engineering consulting site: http://www.moorecad.com
ISO 7185 Standard Pascal web site: http://www.moorecad.com/standardpascal
Classic Basic Games web site: http://www.moorecad.com/classicbasic
The IP Pascal web site, a high performance, highly portable ISO 7185 Pascal
compiler system: http://www.moorecad.com/ippas
Being right is more powerfull than large corporations or governments.
The right argument may not be pervasive, but the facts eventually are.
The point is the thread author wants to go the other direction: ASM to C. There
are disassemblers which can do this along with many compilers - not that
difficult. I know of none that do the opposite which is more difficult: think
of the relationship of a c statment to assembler source compared to the
opposite.
<snip>
>
>The myth of a decompiler has its basis in a factual product that was commonly
>used to "decompile" Basic programs. What these programs actually did was
>decode the internal form for the programs. It worked because the "compiled"
>basic was not actually compiled, but rather changed to an internal form,
>sometimes a trivial encoding. Thus, for example, Visual Basic might be
>"decompiled" because it was never really compiled. Visual Basic is not a true
>compiler.
Not quite.
VBs 1, 2, 3 were only compiled to P-code. There exist programs that
create source from that P-code.
VB4 doesn't seem to have the P-code option, I just checked.
VB 6 has a compile to P-code option. I think that VB5 did also.
Also, it would not be impossible to write a program to decompile
QuickBasic programs. (Versions 4.5 thru VBDOS 1.0) There is only a
limited amount of optimization done by the compiler. All of these
versions use essentially the same code generator, and for the most
part, one sequence of machine code comes from one set of source.
As it would be quite difficult, I would not want to tackle the job,
unless someone forked over a sizable chunk of cash. :-)
--
Arargh409 at [drop the 'http://www.' from ->] http://www.arargh.com
BCET Basic Compiler Page: http://www.arargh.com/basic/index.html
To reply by email, remove the garbage from the reply address.
I've seen this done in a production environment. The original code was
written in 16 bit assembly, and the new target platform was a 32 bit
c compiler. The assembly code was too long to rewrite it in c by hand.
The solution was a simple asm to c translator, that had variables declared
for each register on the 16 bit instruction set. It emitted code for each
assembly instruction. The translator also emitted lables as is. The only
things that needed a few tricks were the code that handled function calls
with c goto-s and the c array that replaced the original stack. The code
was a bit inefficient, but it was possible to run it on non x86 cpus.
Viktor
ps:
For running z80 or 8086 code, there are various emulators written in c,
that take the original code and emulate the original cpu. Their efficiency
is around what java can achive without the jit compiler.
I see you're getting replies about decompilers. If by "z80 or 8086 code"
you meant executables (binaries), then those are the replies you want.
However, if you meant assembler source, then what you're looking for is a
translator.
I encountered one of these some years ago and it seemed like it would work,
but it also seemed like it was an awful answer unless you just HAD to get a
program running on another platform and couldn't afford to spend much time
understanding it. It basically took the program and did a translation into
an custom emulation of the given processor in C for that program.
For example, the sequence
MOV AX, Wocka
ADD AX, BX
became in C
AX = Wocka;
AX = ADD(AX, BX);
And the purpose of that ADD function was to set a series of global variables
which represented the flags register such that they reflected what the
results of the ADD operation would have done on the 8086.
The thing was something like a three or four pass translator, as I recall.
It found loops and functions and such on early passes and filled in other
stuff later. I can't recall at this time (twenty years later) who made it
or even how well it worked. I do recall one of my co-workers saying
something like "Dijkstra would have a heart attack" because of all the
gotos.
This is NOT the one I've mentioned above (at least I don't recognize the
name), but take a look here
http://www.mpsinc.com/index.htm
for an outfit that still seems to be in the business of doing ASM to C (and
other) translations. They have ASM86, MASM and TASM to C translators.
Their wares seem a bit pricey to me, but depending on what you're trying to
do it may not be an issue for you.
- Bill
It would be unreasonable to expect to be able to dump any old binary in
and get well formatted C out, but there are things that can be (and are)
done automatically toward that end. For anyone with experience
thoroughly reverse engineering code manually, it is easy to understand
how some task are quite easily automated. For instance, recognizing the
beginning and end of routines, and the sizes of the stack frame (which
implies the presence of automatic variables) are two that come to mind
immediately. Also, it's easy to generate call trees and to start to
pick out some kinds of high level constructs (e.g. for and while loops,
switch and case statements).
> Even if you are able to recognize a
> few basic contructs in the assembly, such as loops, there is always going
> to be code that your recogizer does not understand.
>
> The question about decompilers comes up fairly often. Just as often, some
> posters have insisted there were workable decompilers available. A few
> years
> ago, I decided to look for one myself. I invited the "decompiler avocates"
> to send me tips on where to find decompilers, and searched extensively
> myself.
> I wanted to see if anyone got even %50 of the code right.
Everything I've seen either translates assembly as really ugly looking C
code (i.e. mov ax,7 becomes ax = 7; ) or it requires that the original
language is already known. I.e. it has to have been created in C,
compiled, and then reverse engineered as C.
> I found several programs that claimed to be decompilers, including one
> written as part of a masters thesis from a reputable institution.
Actually it was a doctoral thesis.
> What I found was that virtually all of them crashed or produced no
> reasonable
> result on the dozens of programs I fed them. I gave them programs
> ranging from
> GCC compiled simple code, to windows or dos programs, etc. In one case, the
> "masters thesis" program, the decompile worked only on the program the
> author
> had supplied (despite the authors own assertions that the program "should
> decompile everything").
Actually, the author's home page says "Dcc has a fundamental
implementation flaw that limits it to about 30KB of input binary
program, i.e. it currently handles toy programs only!" RTFM.
Ed
I think the big myth here is that decompilation is not possible. It
seems this myth spreads like wildfire, dispite evidence to the
contrary. Sure, it's true that fully automated decompilation of
arbitrary machine-code programm is not possible (it's equivilant to
the halting problem) -- however, this is true of many things.
If one is going to insist decompilation is a myth, one must also
conclude disassembling is a myth (also equivilant to the halting
problem) as well as automatically automatically routing a circuit
boards. There are many other examples of problems that are the
equivilant of halting-problems which we have useful programs to
acomplish, they simply require some human input when they get stuck.
Decompilers are exactly the same.
Cristina Cufentes's phd thesis is regarded as many to be one of the
most important modern papers on decompilation theory, and mike van
emmerik has done some very important things as well (he is currently
working on type inference for his phd). His confirmation siminar
concluded that binary code disassemblers have 3.5 problems to solve,
whereas binary code decompilers have 7 (see
http://www.itee.uq.edu.au/~emmerik/confirmation.html).
Probably http://www.program-transformation.org/twiki/bin/view/Transform/DecompilationPossible
has the best overview of what's possible with decompilation. A lot of
it depends on your expectations, just as with disassemblers.
<snip>
>
> Cristina Cufentes's phd thesis is regarded as many to be one of the
> most important modern papers on decompilation theory, and mike van
> emmerik has done some very important things as well (he is currently
> working on type inference for his phd). His confirmation siminar
> concluded that binary code disassemblers have 3.5 problems to solve,
> whereas binary code decompilers have 7 (see
> http://www.itee.uq.edu.au/~emmerik/confirmation.html).
>
> Probably http://www.program-transformation.org/twiki/bin/view/Transform/DecompilationPossible
> has the best overview of what's possible with decompilation. A lot of
> it depends on your expectations, just as with disassemblers.
>
There has been recent progress in automated decompilation area
that builds flow graphs and uses the techniques of modern flow graph
based optiming compilers in reverse. I just heard a paper at SCOPES
conference (Software and Compilers for Embedded Systems) that describes
the algorithms for decompiling digital signal processing code.
Z80 decompilation should be much easier.
Reference is: Johnstone, A. and Scott, E., "Suppression of Redundant
Operations in Reverse Compiled Code Using Global DataFlow Analysis",
in Schpers H. (ed.), 'Software and Compilers for Embedded Systems',
8th International Workshop, SCOPES 2004, September 2-3, 2004, Springer.
Authors email addresses are: A.Joh...@rhul.ac.uk and E.S...@rhul.ac.uk.
See Morgan book for detailed flow graph compilation algorithms:
Morgan, R. 'Building an Optimizing Compiler', Digital Press (now
Elsevier), 1998.
/Steve
--
Steve Meyer Phone: (612) 371-2023
Pragmatic C Software Corp. email: sjm...@pragmatic-c.com
520 Marquette Ave. So., Suite 900
Minneapolis, MN 55402
<snip>
There has been recent progress in the automated decompilation area
that builds flow graphs and uses the techniques of modern flow graph
based optiming compilers in reverse. I just heard a paper at SCOPES
conference (Software and Compilers for Embedded Systems) that describes
the algorithms for decompiling digital signal processing code.
Z80 decompilation should be much easier.
Reference is: Johnstone, A. and Scott, E., "Suppression of Redundant
Operations in Reverse Compiled Code Using Global DataFlow Analysis",
in Schepers H. (ed.), 'Software and Compilers for Embedded Systems',
8th International Workshop, SCOPES 2004, September 2-3, 2004, Springer.
Author's email addresses are: A.Joh...@rhul.ac.uk and E.S...@rhul.ac.uk.
Just remember, the halting problem doesn't claim that you can't write
a program that cannot specify whether a particular program halts or
not. For many special cases, such programs can be written. The halting
problem states that you cannot create a program which will answer
this question for *all* program inputs. Similarly, decompilation of
specialized files *is* possible. Especially if you know what compiler
(and version of the compiler) produced an object file and there
are certain limitations placed on the object file. Most "automated"
disassemblers/decompilers that work half-way decently rely on
such special cases. Of course, "great progress" in the area of
decompilers simply means that they can correctly disassemble/decompile
a *few* sample programs.
It's great progress that's being made, but an interactive decompilation
is almost always necessary for practical use.
Cheers,
Randy Hyde
Yes, it is certainly possible to decompile a program for which the source
is known. I guess that is kind of a special case...
Seriously, if I found a program that decompiled one of my compiler's output
files, I would promptly change the compiler so that it would no longer do
so. Compilers are one of the most effective obfuscators around.
So the folks at data rescue would have you believe :-)
Seriously, though, once you change the compiler, it's now a *different*
compiler. That doesn't change the fact that the decompiler was able
to handle the decompilation of the previous version. And, of course,
once you change your compiler, the decompiler author changes his
software. It's like the old "copyprotection/buster" cycle or the
"virus/antivirus" cycle, all over again :-)
Cheers,
Randy Hyde