[LLVMdev] Advice - llvm as binary to binary translator ?

Erik Buck

unread,

Jun 21, 2008, 9:53:18 PM6/21/08

to llv...@cs.uiuc.edu

First, is there a way to search the archives for this list ? I
apologize in advance if I have stepped on a FAQ.

My goal is to execute legacy binary machine code from a very old one
of a kind computer on a variety of modern computers. I already wrote
an emulator for the legacy machine that executes the old machine
code. However, my emulator is just an interpreter and therefore has
some limitations:

- The emulator spends a lot of time in an executive loop that fetches
legacy instructions, decodes them, and jumps to appropriate C
functions that emulate each legacy instruction. The executive loop
also has to handle emulated interrupts, support single-step debugging,
etc.

- The emulator is compiled and run on only a few modern hardware/
operating system combinations. The emulator is fairly portable, but
extensive optimizations on some platforms restrict capabilities on
other platforms.

- The emulator executes the legacy machine code unmodified which is
good, but that means opportunities for optimization are lost. The
legacy machine code is full of dead code, jumps to jumps, redundant
sub-expressions, unnecessary memory accesses, etc. Back in the old
days, compilers really didn't optimize at all. They generated
horrible code that was sometimes hand modified.

My idea is to convert my emulator into a translator that emits LLVM IR
either directly or via calls to the LLVM library. I would then
execute the result via JIT or native code compilation...

Is this a reasonable approach ?
Can this approach be used even when the legacy code is self
modifying ? After a code modification, a re-translation and re-JIT
would be needed.

Are there any general suggestions ?

_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Erik Buck

unread,

Jun 22, 2008, 11:08:43 PM6/22/08

to llv...@cs.uiuc.edu

I went ahead and tried to translate my legacy machine language into
IR. Legacy branch instructions have me stumped. The branch
instructions are already resolved to destination addresses in the
legacy machine code. For example, there is an instruction that
performs an unconditional branch to the address stored in legacy
register B1.

I can represent register B1 as a local variable:
%B1 = alloca i32 ; storage for emulated register B1
I can generate IR corresponding to legacy machine code that calculates
the value to store in %B1. I can generate the IR to store in %B1.
store i32 %indirectAddr, i32* %B1
Now, how do I generate an IR "br" instruction to the calculated
address in %B1 ? I don't have a suitable destination label in my IR.
I can't create a block for the destination address if the address is
calculated by the legacy code (can I?)

Am I off to the wrong approach ? Are there any suggestions ?

Eli Friedman

unread,

Jun 23, 2008, 12:54:42 AM6/23/08

to LLVM Developers Mailing List

On Sun, Jun 22, 2008 at 8:08 PM, Erik Buck <erik...@sbcglobal.net> wrote:
> I went ahead and tried to translate my legacy machine language into
> IR. Legacy branch instructions have me stumped. The branch
> instructions are already resolved to destination addresses in the
> legacy machine code. For example, there is an instruction that
> performs an unconditional branch to the address stored in legacy
> register B1.
>
> I can represent register B1 as a local variable:
> %B1 = alloca i32 ; storage for emulated register B1
> I can generate IR corresponding to legacy machine code that calculates
> the value to store in %B1. I can generate the IR to store in %B1.
> store i32 %indirectAddr, i32* %B1
> Now, how do I generate an IR "br" instruction to the calculated
> address in %B1 ? I don't have a suitable destination label in my IR.
> I can't create a block for the destination address if the address is
> calculated by the legacy code (can I?)
>
> Am I off to the wrong approach ? Are there any suggestions ?

Completely wrong approach; you're not going to get anywhere trying to
statically translate machine code. (I actually tried something like
this once, and it broke apart in a similar way.)

As far as I know, the only project that made any real progress using
LLVM for binary translation is llvm-qemu
(http://code.google.com/p/llvm-qemu/). The approach there is roughly
to JIT one basic block at a time instead of interpreting one
instruction at a time, which is relatively simple and has a relatively
low overhead.

There's some documentation about llvm-qemu on its website, and there's
some good information on how to get started with the JIT at
http://llvm.org/docs/tutorial/.

That said, there are a lot of ways to speed up a pure interpreter; I'd
suggest trying that first before attempting a JIT-based solution. JIT
can be a useful tool, but translation time can end up being a
significant factor, and it will likely take a lot of work to get good
performance with it.

-Eli

Harry Metcalfe

unread,

Jun 24, 2008, 4:28:38 AM6/24/08

to LLVM Developers Mailing List

Hi Eric,

I'm currently writing an IA-32 to LLVMIR translator. I'm only mid way
through, but I can certainly say that there have been more difficulties
than I anticipated when I began!

I think that it is a reasonable approach, perhaps especially in your
case, since you have an emulator already. Automatic static translation
is equivalent to the halting problem for IA-32 code, though perhaps it
wouldn't be for yours (what architecture are you using?). A dynamic
phase is therefore necessary for me -- if it is for you too, you'll have
a leg up.

Self-modifying code is both hideous and unusual, and very difficult to
deal with. I'm leaving it to one side.

General thoughts: are you sure that LLVMIR is suitable? You may be
better off with a lower-level representation. At least in my case, LLVM
enforces a level of structure that doesn't exist in machine code. That's
something you'll also probably have to deal with.

Its type system also hampers the modification of translated code, so
it's advantageous to ensure that you won't need to change any code once
translated. This is of particular importance when you're trying to
figure out the bounds of an array, and things like that: a change to the
size of an array is a change of its type, which means it's much easier
just to get the size of the array right in the first place. I'm
currently in the process of altering my code so that a lot more analysis
takes place before translation even begins!

Finally, how will you deal with memory accesses and aliasing? This is
certainly the thorniest problem, and its the one my dynamic phase exists
to solve.

Do email me off-list if you like -- it sounds like we're pursuing
similar lines of inquiry!

Harry

Nuno Lopes

unread,

Jun 24, 2008, 5:28:06 AM6/24/08

to LLVM Developers Mailing List

You may also want to take a look at valgrind. It is capable of translating
IA-32/64, PPC-32/64, etc.. to its own SSA-style IR. And then it has backends
from the IR to the very same architectures. You can also build a backend to
LLVM and let it further optimize the generated code (although valgrind
already has its own optimizers).
Building such translators is not easy business. I can tell you that by
experience.. Depending on what you want to achieve, I would reuse something
already existent.

Nuno

Jonathan S. Shapiro

unread,

Jun 24, 2008, 2:33:50 PM6/24/08

to LLVM Developers Mailing List

This is off-topic. I would reply directly to Erick, but I seem to have
missed the note that had his reply address.

Erick, Harry:

The evidence in the literature supporting non-trivial optimization in a
dynamic binary to binary translator should be viewed with caution. Our
suspicion when we started the HDTrans work was that all that the fancy
optimizations accomplish in most cases is to (almost) make up for the
computational cost of generating the IR in the first place. The numbers
we achieved on IA32->IA32 translation seem to support that view. There
are special cases where this might not be true, and of course there are
specific counter-examples, but the HDTrans numbers apparently came as a
shock to some people in the dynamic translator world.

Erick mentioned in particular a number of control flow optimizations
that are very easily handled in a direct emitter (i.e. with no IR). Dead
code is also naturally eliminated by any dynamic translator. Redundant
expressions are not, but the performance advantage of modern processors
over legacy machines is so large that eliminating those may not be
worthwhile. Newer machines typically have more registers and/or fast L1
caches, both of which are your friend in a dynamic translator.

Given this, if performance actually matters your time and effort might
be better spent crafting a really tight binary decoder that pays careful
attention to cache utilization, and then implementing a relatively
modest number of branch to branch and call/return optimizations that are
fairly simple to do with a low-level direct translator.

Regardless of whether you accept my argument about the value of
optimization, it's clear from the HDTrans numbers that you can lose a
lot more performance in decoding and IR generation than the published
research work has generally acknowledged.