[Caml-list] Bytecode object files structure

Pierre-Etienne Meunier

unread,

Nov 12, 2006, 9:45:12 AM11/12/06

to caml...@yquem.inria.fr

Hi,

I'm trying to decrypt .cmo files produced by simple programs, such as
1+1;;
or
print_string "string";;
or
List.length [1;2;3;4;5];;

According to the source of Ocaml, there's something called the
"cmo_magic_number", systematically written at the beginning of all .cmo
files. Does it have a real function for executing the programs, or is it just
a way to make sure the file contains ocaml bytecode ?

Then, there's the address of what seems to be the last bytecode instruction.
Then, the bytecode instructions, as documented in opcodes.ml.

After that, I can't understand anything : there vaguely seems to be some
information related to linking or so... What is the precise structure of this
part ? Is there some kind of a bytecode assembler ?

Thanks,
P.E. Meunier (pierreetie...@ens-lyon.fr)

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Alain Frisch

unread,

Nov 12, 2006, 9:59:03 AM11/12/06

to Pierre-Etienne Meunier

Pierre-Etienne Meunier wrote:
> According to the source of Ocaml, there's something called the
> "cmo_magic_number", systematically written at the beginning of all .cmo
> files. Does it have a real function for executing the programs, or is it just
> a way to make sure the file contains ocaml bytecode ?

It is just a way to make sure that the file contains ocaml bytecode with
the expected version.

> After that, I can't understand anything : there vaguely seems to be some
> information related to linking or so... What is the precise structure of this
> part ? Is there some kind of a bytecode assembler ?

The structure is a compilation unit descriptor, described in
bytecomp/cmo_format.mli.

-- Alain

Yann Régis-Gianas

unread,

Nov 13, 2006, 4:22:39 AM11/13/06

to Pierre-Etienne Meunier

Hi,

The file tools/dumpobj.ml in the O'Caml tree may be used to parse the
object file. This should be a first step to understand the bytecode
file format.

Hope this help,

--
Yann Régis-Gianas

--
Yann

Pierre-Etienne Meunier

unread,

Nov 13, 2006, 6:41:53 AM11/13/06

to Xavier Clerc

Hello,

I'd like to write an assembler, to be able to understand how the vm really
works. I've to work on this for a school project (a compiler, I want it to
output caml bytecode object files).

I've understood that the data part, after the code itself, was generated using
output_value (I didn't know this function before). What I don't get now are
the cu_reloc, cu_primitives and cu_imports fields of the compilation_unit
type.

If you can help on this,
Thanks
P.E. Meunier

On Monday 13 November 2006 11:53, Xavier Clerc wrote:
> Hello,
>
> As I read a substancial part of the ocaml source code, I may help you
> understanding file formats.
> Could you be more precise about what you are particularly interested
> in :
> - file type : bytecode file, cmo file, cmi file ?
> - code or data section of these files ?
>
> May I also ask you what you are trying to do using these elements ?
>
>
> Cordially,
>
> Xavier Clerc
>
> Le 12 nov. 06 à 15:42, Pierre-Etienne Meunier a écrit :

Xavier Clerc

unread,

Nov 15, 2006, 8:56:59 AM11/15/06

to Pierre-Etienne Meunier

Le 13 nov. 06 à 16:50, Pierre-Etienne Meunier a écrit :

> Hello,
>
> I'd like to write an assembler, to be able to understand how the vm
> really
> works. I've to work on this for a school project (a compiler, I
> want it to
> output caml bytecode object files).

If you are working on a compiler that should output files to be
executed by the ocaml runtime, it does not seem necessary to handle
cmo/cmi files as the format of bytecode file should be sufficient to
code your compiler. Unless you have to link with ocaml modules.

> I've understood that the data part, after the code itself, was
> generated using
> output_value (I didn't know this function before).

This fonction is used by the Marshal module. It transforms any non-
abstract value into a chain of bytes.
The format of marshalling can be understood from the extern_rec
function of the byterun/extern.c file.

> What I don't get now are
> the cu_reloc, cu_primitives and cu_imports fields of the
> compilation_unit
> type.

You should remember that cmo files are parts that will be put
together (linked) in order to create a bytecode file.
Given this context :
- cu_imports lists the name of imported (used) modules the current
cmo should be linked with in order to produce a bytecode file (the
digest of the imported modules is also kept to ensure that you link
with the same version you compiled against) ;
- cu_primitives lists the primitives declared by the current module
(each 'external f : type1 -> type2 = "primitive" ' will result in a
"primitive" entry of this list), needed to ensure that all required C
primitives are provided ;
- cu_reloc : as each module is compiled independently, it can
declare some elements (e.g. global variables) and use them using a 0-
based index ; thus, when you link several modules together, you have
to relocate this information to ensure that the first module uses
indexes from 0 to n, the second module uses indexes from n+1 to n+m
and so on ...

Hope this helps,

Xavier Clerc

PS : I am working on some documents describing marshalling format,
bytecode files as well as instruction opcodes.
I will hopefully release them before xmas but don't hold your breath
as I don't have much spare time these days.
In the meantime, you can contact me off-list for any related question.

>
> If you can help on this,
> Thanks
> P.E. Meunier
>