Even slow C code equivalent of assembly would be good.
Surely someone has a half done parser out there !
I do have the license for the code. The code is written as a Win16
app but I am trying to port it to Win32. C code would be much, much
easier to port.
Thanks,
Lynn McGuire
[This question has come up over the years many times, and the answer is
that it's a really hard problem. Austin Code Works did an x86 to C
translator ten or 15 years ago which worked, but wasn't very useful
because the C code was just a a transliteration of the x86 code with
variable names like ax and ebp. I've seen some work on recovering
code from Vaxes, again a while ago. On the other hand, there's a lot
of work going on in binary translation, turning one kind of object
code to another. Look at the comp.compilers archives for messages
and conference announcements. -John]
Trent Waddington
University of Queensland
Software Migrations Ltd, Durham University and (now) De Montfort University
have done a lot of work with converting IBM 370 Assembler to C code.
This is not just a transliteration of the assembler but includes
a lot of program transformations and control and data flow analysis.
My web site (with various papers): http://www.dur.ac.uk/~dcs0mpw/
Software Migrations web site (for information about the commercial
FermaT code comprehension and migration workbench): http://www.smltd.com
More recently I have started working with Cristina Cifuentes
(http://www.it.uq.edu.au/~cristina/), who has done a lot of work on
binary translation, to take the output of her x86 parser and migrate
it to WSL and from there to C.
Martin
Marti...@durham.ac.uk http://www.dur.ac.uk/~dcs0mpw/ Erdos number: 4
> Does anyone know of a good tool to convert x86 assembly to C code ?
>
> Even slow C code equivalent of assembly would be good.
>
> Surely someone has a half done parser out there !
At one time there was a web page devoted to the topic from the point
of view of theory . It had downloads, several of which were for the
x86, but weren't very complete because the state of the art in this
area is far from complete.. If interested email me at
mailto:cam...@bluegrass.net & I'll try to find it. You can also look
at www.decompiler.com but I don't think that is the page I'm referring
to...
Meanwhile the guys at this site are working in this general direction:
They have a very good disassembler package primarily for the x86 but
also for other processors; and within the last few years have put in
support for complete identification of the run-time libraries for all
the popular C compilers for both DOS and WIN16/32. If your code was
originally written in C this could be handy for isolating areas you
don't need to rework.
David
> [This question has come up over the years many times, and the answer is
> that it's a really hard problem.
Actually, I want to convert from the MASM source of the Win16
application. I have the source code along with the macros, comments,
etc... I am not trying to port raw machine language code (that is
very, very difficult).
I have tried the commercial MASM2C converter and it had some
major hiccups with structures and macros that the code uses.
Thanks,
Lynn McGuire
Thanks,
Lynn McGuire
>A big problem with decompilation in general is
>determining the number of parameters that a procedure takes. Liveness
>analysis can do this, but on x86 this means you have to determine what
>stack locations are used before they are defined. With traditional C
>calling convention this is not a problem (ie, unix is ok) but when you
>introduce pascal calling conventions - callee pop - you have no way of
>knowing exactly how the stack pointer is changed after each call.
Perhaps you should revise your model?
Subroutine calls in general can result in any change to the stack;
some subroutines will never return, others take arguments from the
code segment, following the call, and will return to some other
location. That's a situation you must live with.
I'd distinguish regular from exceptional subroutines. Regular
subroutines have a fixed amount of argument bytes on the stack, which
are popped by either the caller or the callee. Exceptional
subroutines can do to the stack whatever they like, and can (not)
return to any arbitrary code location.
All according indications can be found in the code of any subroutine,
with a simple analysis of sequential instructions.
DoDi
I've done some work on this, and other source translation, and the
answer is a flexible translation system that can easily LEARN or at
least be taught the idioms of the particular source language(and
programmer). Starting with a competent translation, recognizing the
areas of poor understanding, then attacking them with a tool that can
specialize on each particular pattern and produce better output. Of
course, it must be capable of simple arithmetic expression
decompilation, and recognize logical expressions usually expressed as
comparison then conditional SKIPs or JMPs.
An example of superfluous code is the code pushes a lot of registers
to call a routine and then pops them back, any programmer would
understand the registers which are being protected from the routine
and which are arguments, the translator should find the routine in the
Database of procedures, suppress the protection push/pops and use the
arguments in the function call and result type/location. Does it
return a condition code or pointer or "int" in which register? Is the
condition code normally ==/>/< or .overflo. and what is TRUE in that
case?
Successive runs of the translator will remove the superfluous code,
and the result will be more maintainable, along with the comments
which must be preserved. Some comments are not worth saving, and for
one customer, we taught the translator to suppress stylized comments
about the binary point for fixed point binary arithmetic, since one of
the results of the translation was to go from assembly on a 16bit mini
to 64 bit floating point. That translator also recognized that shifts
could be multiply/divide by powers of 2 when dealing with arithemtic
items, and address resolution adjustments(which could be ignored) when
dealing with pointers. I like to think of these successive runs as
hacking at the weeds around the object to see if the land is level
flat or rolling or rutted or even SWAMPY.
The critical elements in the translator are fast turnaround, and ease
of extension to the database and logic, recognizing larger patterns
will produce more maintainable code.
In the case of Lynn's code, the real problem is the macros, expanding
and then trying to decompile the instructions to C will never work as
well as recognizing the macros themselves as instructions and
constructing productions based on their intended functionality.
>code from Vaxes, again a while ago. On the other hand, there's a lot
>of work going on in binary translation, turning one kind of object
>code to another. Look at the comp.compilers archives for messages
>and conference announcements. -John]
Bob Sheff, Independent Consultant
available for work on source translation projects
and other interesting coding.
bsheff2 AT yahoo D O T C O M