Hi. I'm working on a project that involves translating a proprietary byte-code language into x86 machine code on the fly. Currently this language is interpretted at runtime. By translating the byte-code into x86 instructions at compile-time, we can execute the code natively at runtime using a function pointer to the instructions stored in data.
Initially I was planning on writing this translation phase myself, since the instruction set of the byte-code language is fairly simple and straightforward. But I wonder, is there an in-memory based x86 assembler available for use that we can build and link into our product, which takes, as input, a buffer of ascii text representing a function written in x86 assembly, and returns the machine code in a separate memory buffer?
There are free x86 assemblers out there, but I would have to modify the code and build it as a dll. I would also have to modify the output generation code to write the data to a memory buffer. These are things I can do, but I'm wondering if any of you have done that already? Also, this would be for commercial use. And lastly, an assembler that has optimization phases built into it would be a significant plus.
> Hi. I'm working on a project that involves translating a proprietary > byte-code language into x86 machine code on the fly. Currently this > language is interpretted at runtime. By translating the byte-code > into x86 instructions at compile-time, we can execute the code > natively at runtime using a function pointer to the instructions > stored in data.
yep.
> Initially I was planning on writing this translation phase myself, > since the instruction set of the byte-code language is fairly simple > and straightforward. But I wonder, is there an in-memory based x86 > assembler available for use that we can build and link into our > product, which takes, as input, a buffer of ascii text representing a > function written in x86 assembly, and returns the machine code in a > separate memory buffer?
my assembler can do this (does require that the frontend provide some callbacks, mostly for allocating the buffer, ...), although typically I have been using it with its own internal in-memory linker...
it can also produce COFF objects (Win32 and Win64 / x64 variants), ...
> There are free x86 assemblers out there, but I would have to modify > the code and build it as a dll. I would also have to modify the > output generation code to write the data to a memory buffer. These > are things I can do, but I'm wondering if any of you have done that > already? Also, this would be for commercial use. And lastly, an > assembler that has optimization phases built into it would be a > significant plus.
granted, it is LGPL, but if this is a problem, I may be able to provide a version with the notices "mysteriously" removed (actually, I may or may not get around to moving my whole VM project to BSD for other reasons).
(note: LGPL is not "sticky" in exactly the same way as GPL, namely, it doesn't effect code linked to it, but I guess it does have a few other lesser issues...).
currently, it does produce a DLL, and is currently being built with MSVC + GNU make (shouldn't be too hard to make a VS project from it, although it does use a few provided tools for autoheadering and keeping the opcode listing up to date, which may be an issue with VS...).
as for the assembler, it was originally written to address more-or-less these same sorts of issues (although it does not presently include a micro-optimizer, it is assumed that the compiler/JIT produce "reasonably efficient" code).
also maybe worth looking into is YASM, which I guess includes some similar functionality (I have not fully investigated it though).
my assembler (BGBASM), NASM, and YASM all use fairly similar syntax, although "here and there" it is possible that code could run into differences.
> Initially I was planning on writing this translation phase myself, > since the instruction set of the byte-code language is fairly simple > and straightforward. But I wonder, is there an in-memory based x86 > assembler available for use that we can build and link into our > product, which takes, as input, a buffer of ascii text representing a > function written in x86 assembly, and returns the machine code in a > separate memory buffer?
You don't need to create ASCII source text. You could create the machine code directly using the same strategy as (other) JIT'ed languages, see, e.g., the assembler_x86.cpp file used by Java HotSpot compiler (available in OpenJDK), or a similar Open Source project http://code.google.com/p/asmjit/ . They are both made to directly generate code in a memory buffer.
>> Initially I was planning on writing this translation phase myself, >> since the instruction set of the byte-code language is fairly simple >> and straightforward. But I wonder, is there an in-memory based x86 >> assembler available for use that we can build and link into our >> product, which takes, as input, a buffer of ascii text representing a >> function written in x86 assembly, and returns the machine code in a >> separate memory buffer?
> You don't need to create ASCII source text. You could create the > machine code directly using the same strategy as (other) JIT'ed > languages, see, e.g., the assembler_x86.cpp file used by Java HotSpot > compiler (available in OpenJDK), or a similar Open Source project > http://code.google.com/p/asmjit/ . They are both made to directly > generate code in a memory buffer.
it is not "necessary" to do so, but I can think of a few good reasons to do so.
one of the more major ones being that ASCII based ASM is far more generic than API calls, and is also far better at abstracting ones' compiler from their assembler (thus improving project modularity).
namely: the codegen can be naturally abstracted from the assembler, since as far as it is concerned, it is producing text which goes in a buffer.
it is also much more convinient to produce textual ASM, since one can lump together several opcodes per string, and use a printf-style interface, rather than having to use an API call per opcode.
more so, some kinds of micro-optimizations, such as knowing when 8-bit jumps are safe, ... can't be readily done in a single pass, and so multiple passes may be needed for things like this (otherwise, all jumps need 32-bits).
in other cases, ASM macros may be helpful/convinient.
...
the one (possible) downside, is that there may be an overhead required in parsing said ASM, but personally I don't know of any real "sane" app design where this is likely to be a noticable performance bottleneck...
so, as I see it, textual ASM is generally a win.
it is much like producing object files prior to producing the EXE. it may seem pointless to produce object files and then link them into an EXE, but there are many subtle advantages to having this seemingly trivial extra step in the mix.
> Hi. I'm working on a project that involves translating a proprietary > byte-code language into x86 machine code on the fly. Currently this > language is interpretted at runtime. By translating the byte-code > into x86 instructions at compile-time, we can execute the code > natively at runtime using a function pointer to the instructions > stored in data. > Initially I was planning on writing this translation phase myself, > since the instruction set of the byte-code language is fairly simple > and straightforward. But I wonder, is there an in-memory based x86 > assembler available for use that we can build and link into our > product, which takes, as input, a buffer of ascii text representing a > function written in x86 assembly, and returns the machine code in a > separate memory buffer? > There are free x86 assemblers out there, but I would have to modify > the code and build it as a dll. I would also have to modify the > output generation code to write the data to a memory buffer. These > are things I can do, but I'm wondering if any of you have done that > already? Also, this would be for commercial use. And lastly, an > assembler that has optimization phases built into it would be a > significant plus.
FASM has a DLL version. Although it can generate both 32-bit and 64-bit code, the DLL itself is a 32-bit binary.
-- write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, & 6.0134700243160014d-154/),(/'x'/)); end
> [...] is there an in-memory based x86 > assembler available for use that we can build and link into our > product, which takes, as input, a buffer of ascii text representing a > function written in x86 assembly, and returns the machine code in a > separate memory buffer?
Yes... No. I thought I saw one recently. But, this is all I could find. These don't do exactly what you asked: