Google Groups Home
Help | Sign in
Message from discussion Bytecode object files structure
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
Xavier Clerc  
View profile
 More options Nov 15 2006, 8:56 am
Newsgroups: fa.caml
From: Xavier Clerc <xcfo...@free.fr>
Date: Wed, 15 Nov 2006 13:56:59 UTC
Local: Wed, Nov 15 2006 8:56 am
Subject: Re: [Caml-list] Bytecode object files structure

Le 13 nov. 06 à 16:50, Pierre-Etienne Meunier a écrit :

> Hello,

> I'd like to write an assembler, to be able to understand how the vm  
> really
> works. I've to work on this for a school project (a compiler, I  
> want it to
> output caml bytecode object files).

If you are working on a compiler that should output files to be  
executed by the ocaml runtime, it does not seem necessary to handle  
cmo/cmi files as the format of bytecode file should be sufficient to  
code your compiler. Unless you have to link with ocaml modules.

> I've understood that the data part, after the code itself, was  
> generated using
> output_value (I didn't know this function before).

This fonction is used by the Marshal module. It transforms any non-
abstract value into a chain of bytes.
The format of marshalling can be understood from the extern_rec  
function of the byterun/extern.c file.

> What I don't get now are
> the cu_reloc, cu_primitives and cu_imports fields of the  
> compilation_unit
> type.

You should remember that cmo files are parts that will be put  
together (linked) in order to create a bytecode file.
Given this context :
        - cu_imports lists the name of imported (used) modules the current  
cmo should be linked with in order to produce a bytecode file (the  
digest of the imported modules is also kept to ensure that you link  
with the same version you compiled against) ;
        - cu_primitives lists the primitives declared by the current module  
(each 'external f : type1 -> type2 = "primitive" ' will result in a  
"primitive" entry of this list), needed to ensure that all required C  
primitives are provided ;
        - cu_reloc : as each module is compiled independently, it can  
declare some elements (e.g. global variables) and use them using a 0-
based index ; thus, when you link several modules together, you have  
to relocate this information to ensure that the first module uses  
indexes from 0 to n, the second module uses indexes from n+1 to n+m  
and so on ...

Hope this helps,

Xavier Clerc

PS : I am working on some documents describing marshalling format,  
bytecode files as well as instruction opcodes.
I will hopefully release them before xmas but don't hold your breath  
as I don't have much spare time these days.
In the meantime, you can contact me off-list for any related question.

> If you can help on this,
> Thanks
> P.E. Meunier

> On Monday 13 November 2006 11:53, you wrote:
>> Hello,

>> As I read a substancial part of the ocaml source code, I may help you
>> understanding file formats.
>> Could you be more precise about what you are particularly interested
>> in :
>>        - file type : bytecode file, cmo file, cmi file ?
>>        - code or data section of these files ?

>> May I also ask you what you are trying to do using these elements ?

>> Cordially,

>> Xavier Clerc

>> Le 12 nov. 06 à 15:42, Pierre-Etienne Meunier a écrit :
>>> Hi,

>>> I'm trying to decrypt .cmo files produced by simple programs,  
>>> such as
>>> 1+1;;
>>> or
>>> print_string "string";;
>>> or
>>> List.length [1;2;3;4;5];;

>>> According to the source of Ocaml, there's something called the
>>> "cmo_magic_number", systematically written at the beginning of
>>> all .cmo
>>> files. Does it have a real function for executing the programs, or
>>> is it just
>>> a way to make sure the file contains ocaml bytecode ?

>>> Then, there's the address of what seems to be the last bytecode
>>> instruction.
>>> Then, the bytecode instructions, as documented in opcodes.ml.

>>> After that, I can't understand anything : there vaguely seems to be
>>> some
>>> information related to linking or so... What is the precise
>>> structure of this
>>> part ? Is there some kind of a bytecode assembler ?

>>> Thanks,
>>> P.E. Meunier (pierreetienne.meun...@ens-lyon.fr)

>>> _______________________________________________
>>> Caml-list mailing list. Subscription management:
>>> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
>>> Archives: http://caml.inria.fr
>>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>>> Bug reports: http://caml.inria.fr/bin/caml-bugs

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2008 Google