convert x86 assembly to c ?

Lynn McGuire

unread,

Sep 24, 2000, 3:00:00 AM9/24/00

to

Does anyone know of a good tool to convert x86 assembly to C code ?

Even slow C code equivalent of assembly would be good.

Surely someone has a half done parser out there !

I do have the license for the code. The code is written as a Win16
app but I am trying to port it to Win32. C code would be much, much
easier to port.

Thanks,
Lynn McGuire
[This question has come up over the years many times, and the answer is
that it's a really hard problem. Austin Code Works did an x86 to C
translator ten or 15 years ago which worked, but wasn't very useful
because the C code was just a a transliteration of the x86 code with
variable names like ax and ebp. I've seen some work on recovering
code from Vaxes, again a while ago. On the other hand, there's a lot
of work going on in binary translation, turning one kind of object
code to another. Look at the comp.compilers archives for messages
and conference announcements. -John]

Trent Waddington

unread,

Sep 25, 2000, 3:00:00 AM9/25/00

to

I'm currently developing a retargetable decompiler which will work on
windoze binaries. However, 1) we are not aiming at win16 because it
is such a pain and 2) we are no-where near a useful stage of
development. A big problem with decompilation in general is
determining the number of parameters that a procedure takes. Liveness
analysis can do this, but on x86 this means you have to determine what
stack locations are used before they are defined. With traditional C
calling convention this is not a problem (ie, unix is ok) but when you
introduce pascal calling conventions - callee pop - you have no way of
knowing exactly how the stack pointer is changed after each call. So
you have to do a lot of analysis on the entire call graph just to
discover the stack depth at any point in the control flow graph. When
you have cycles in your call graph this problem because non-trivial.
Windows code makes use of both the C calling convention and the pascal
calling convention, for both local calls and calls to dll's. There
are also other problems to determining the parameters of a procedure
that pop up when you start looking at register calling conventions, as
are present in most RISC machines.

Trent Waddington
University of Queensland

Martin Ward

unread,

Sep 25, 2000, 3:00:00 AM9/25/00

to

"Lynn McGuire" <win...@winsim.com> writes:
> Does anyone know of a good tool to convert x86 assembly to C code ?

Software Migrations Ltd, Durham University and (now) De Montfort University
have done a lot of work with converting IBM 370 Assembler to C code.
This is not just a transliteration of the assembler but includes
a lot of program transformations and control and data flow analysis.

My web site (with various papers): http://www.dur.ac.uk/~dcs0mpw/

Software Migrations web site (for information about the commercial
FermaT code comprehension and migration workbench): http://www.smltd.com

More recently I have started working with Cristina Cifuentes
(http://www.it.uq.edu.au/~cristina/), who has done a lot of work on
binary translation, to take the output of her x86 parser and migrate
it to WSL and from there to C.

Martin

Marti...@durham.ac.uk http://www.dur.ac.uk/~dcs0mpw/ Erdos number: 4

david lindauer

unread,

Sep 25, 2000, 3:00:00 AM9/25/00

to

Lynn McGuire wrote:

> Does anyone know of a good tool to convert x86 assembly to C code ?
>

> Even slow C code equivalent of assembly would be good.
>
> Surely someone has a half done parser out there !

At one time there was a web page devoted to the topic from the point
of view of theory . It had downloads, several of which were for the
x86, but weren't very complete because the state of the art in this
area is far from complete.. If interested email me at
mailto:cam...@bluegrass.net & I'll try to find it. You can also look
at www.decompiler.com but I don't think that is the page I'm referring
to...

Meanwhile the guys at this site are working in this general direction:

www.datarescue.com

They have a very good disassembler package primarily for the x86 but
also for other processors; and within the last few years have put in
support for complete identification of the run-time libraries for all
the popular C compilers for both DOS and WIN16/32. If your code was
originally written in C this could be handy for isolating areas you
don't need to rework.

David

Lynn McGuire

unread,

Sep 25, 2000, 3:00:00 AM9/25/00

to

Hi,

> [This question has come up over the years many times, and the answer is
> that it's a really hard problem.

Actually, I want to convert from the MASM source of the Win16
application. I have the source code along with the macros, comments,
etc... I am not trying to port raw machine language code (that is
very, very difficult).

I have tried the commercial MASM2C converter and it had some
major hiccups with structures and macros that the code uses.

Thanks,
Lynn McGuire

Lynn McGuire

unread,

Sep 28, 2000, 3:00:00 AM9/28/00

to

Actually, if you are looking for a machine language to c code
translator, try http://www.backerstreet.com/rec/rec.htm

Thanks,
Lynn McGuire

VBDis

unread,

Sep 28, 2000, 3:00:00 AM9/28/00

to

Trent Waddington <s33...@student.uq.edu.au> schreibt:

>A big problem with decompilation in general is
>determining the number of parameters that a procedure takes. Liveness
>analysis can do this, but on x86 this means you have to determine what
>stack locations are used before they are defined. With traditional C
>calling convention this is not a problem (ie, unix is ok) but when you
>introduce pascal calling conventions - callee pop - you have no way of
>knowing exactly how the stack pointer is changed after each call.

Perhaps you should revise your model?

Subroutine calls in general can result in any change to the stack;
some subroutines will never return, others take arguments from the
code segment, following the call, and will return to some other
location. That's a situation you must live with.

I'd distinguish regular from exceptional subroutines. Regular
subroutines have a fixed amount of argument bytes on the stack, which
are popped by either the caller or the callee. Exceptional
subroutines can do to the stack whatever they like, and can (not)
return to any arbitrary code location.

All according indications can be found in the code of any subroutine,
with a simple analysis of sequential instructions.

DoDi

bsh...@yahoo.com

unread,

Oct 1, 2000, 12:26:05 AM10/1/00

to

"Lynn McGuire" <win...@winsim.com> wrote:
>Does anyone know of a good tool to convert x86 assembly to C code ?

<snip>
>Lynn McGuire

>[This question has come up over the years many times, and the answer is

>that it's a really hard problem. Austin Code Works did an x86 to C
>translator ten or 15 years ago which worked, but wasn't very useful
>because the C code was just a a transliteration of the x86 code with
>variable names like ax and ebp. I've seen some work on recovering

I've done some work on this, and other source translation, and the
answer is a flexible translation system that can easily LEARN or at
least be taught the idioms of the particular source language(and
programmer). Starting with a competent translation, recognizing the
areas of poor understanding, then attacking them with a tool that can
specialize on each particular pattern and produce better output. Of
course, it must be capable of simple arithmetic expression
decompilation, and recognize logical expressions usually expressed as
comparison then conditional SKIPs or JMPs.

An example of superfluous code is the code pushes a lot of registers
to call a routine and then pops them back, any programmer would
understand the registers which are being protected from the routine
and which are arguments, the translator should find the routine in the
Database of procedures, suppress the protection push/pops and use the
arguments in the function call and result type/location. Does it
return a condition code or pointer or "int" in which register? Is the
condition code normally ==/>/< or .overflo. and what is TRUE in that
case?

Successive runs of the translator will remove the superfluous code,
and the result will be more maintainable, along with the comments
which must be preserved. Some comments are not worth saving, and for
one customer, we taught the translator to suppress stylized comments
about the binary point for fixed point binary arithmetic, since one of
the results of the translation was to go from assembly on a 16bit mini
to 64 bit floating point. That translator also recognized that shifts
could be multiply/divide by powers of 2 when dealing with arithemtic
items, and address resolution adjustments(which could be ignored) when
dealing with pointers. I like to think of these successive runs as
hacking at the weeds around the object to see if the land is level
flat or rolling or rutted or even SWAMPY.

The critical elements in the translator are fast turnaround, and ease
of extension to the database and logic, recognizing larger patterns
will produce more maintainable code.

In the case of Lynn's code, the real problem is the macros, expanding
and then trying to decompile the instructions to C will never work as
well as recognizing the macros themselves as instructions and
constructing productions based on their intended functionality.

>code from Vaxes, again a while ago. On the other hand, there's a lot
>of work going on in binary translation, turning one kind of object
>code to another. Look at the comp.compilers archives for messages
>and conference announcements. -John]

Bob Sheff, Independent Consultant
available for work on source translation projects
and other interesting coding.
bsheff2 AT yahoo D O T C O M