Erlang terms in Ocaml

24 views
Skip to first unread message

Fermin Reig

unread,
Jun 29, 2007, 7:30:56 PM6/29/07
to erlocaml-discuss
Hi,

Assume we define an Ocaml representation of erlang terms like the
following:

type term =
| T_atom of string
| T_bool of bool
| T_float of float
| T_list of term list
| ...

Assume we want an implementation where erlang will communicate with
ocaml via TCP/IP sending terms as byte streams, instead of sharing the
same address space. Then, I can think of two ways we can convert the
terms encoded as binaries to and from the ocaml representation: 1) via
ocaml's FFI, using the erl_interface library that is part of OTP and
2) implementing the conversion in Ocaml, using jinterface as
inspiration. I've interfaced a dynamically typed language that
provides a C interface with Ocaml before, using the FFI, but I haven't
used erl_interface nor jinterface before, Maybe someone who has could
indicate which of my suggestions above would be simpler (or provide a
better one).

Fermin

Serge Aleynikov

unread,
Jun 29, 2007, 10:32:55 PM6/29/07
to erlocaml...@googlegroups.com
TCP/IP is the easiest way to interface, however it is not the fastest.
There are three options:

1. Via some network transport (e.g. TCP)
2. Port program
3. Driver

I haven't done any benchmarks between #1 & #2, but as far as I recall #2
is about 10 times slower than #3. In terms of complexity 1 & 2 are on
the same order, whereas #3 is more difficult to implement and test, but
if you are looking for speed, it would be a goo incentive.

I would stay away from jinterface for any of these tasks, and stick to
ei interface library for #2 and a combination of ei and erl_interface
for #1. #3 requires a bit different approach (check out inet_drv.c for
examples of how to marshal data by avoiding extra copying).

On the Ocaml side for #1 & #2 you can use ei interface library and
create C -> Ocaml interface wrappers to deal with Ocaml values. The
most difficult term to encode would probably be a tuple as Erlang and
Ocaml have a different semantic representation of a tuple. In Ocaml a
tuple cannot be distinguished from an array on the C side (as Ocaml
doesn't quite have RTTI aside from most generic tags for strings,
doubles, etc), and on the Ocaml side you don't have a luxury of
element(N, Tuple) and size(Tuple) calls that are available in Erlang.

Serge

Ulf Wiger

unread,
Jun 30, 2007, 12:11:20 PM6/30/07
to erlocaml...@googlegroups.com
I would like to add as an option that the Erlang Term Format
is decoded directly in Ocaml:

It shouldn't be that hard, and I think it might be faster than
using one of the libraries.

BR,
Ulf W


2007/6/30, Serge Aleynikov <sal...@gmail.com>:

F Reig

unread,
Jun 30, 2007, 5:48:50 PM6/30/07
to erlocaml...@googlegroups.com
On 6/30/07, Ulf Wiger <u...@wiger.net> wrote:
>
> I would like to add as an option that the Erlang Term Format
> is decoded directly in Ocaml:

Is the Erlang Term Format documented somewhere, or do we have to
reverse-engineer it by reading the implementations of ei_decode_* and
ei_encode_* ?

Fermin

Ulf W

unread,
Jun 30, 2007, 6:03:50 PM6/30/07
to erlocaml-discuss

On Jun 30, 11:48 pm, "F Reig" <fermin.r...@gmail.com> wrote:
>
> Is the Erlang Term Format documented somewhere, or do we have to
> reverse-engineer it by reading the implementations of ei_decode_* and
> ei_encode_* ?

It is documented, at least as a text file in the source code
distribution (erts/emulator/internal_doc/erl_ext_dist.txt)

You can find one version online here:
http://software.pupeno.com/pkg/breezy/erlang/erlang/erts/emulator/internal_doc/erl_ext_dist.txt

That seems to be the R10B-0 version, but the format
doesn't change much.

The file also contains descriptions of the Erlang Port Mapper Daemon
Protocol and the Distribution Protocol.

Distel (Distributed Emacs Lisp) implements this protocol directly,
and talks to Erlang nodes that way:

http://distel.googlecode.com/svn/trunk/elisp/erlext.el

BR,
Ulf W

Serge Aleynikov

unread,
Jun 30, 2007, 7:08:20 PM6/30/07
to erlocaml...@googlegroups.com
There's certainly enough confusion around term naming formats:

- Erlang Term Format
- External Binary Format
- Driver Term Format

I think the first one is what's represented by ETERM and what's used in
erl_interface (see erl_interface user's guide). The second one is the
result of serialization of internal representation of Erlang terms using
term_to_binary() and binary_to_term() functions on Erlang side and ei
interface on C side. Finally, the Driver Term Format is documented in
the erts / erl_driver section of documentation. This one is used when
writing drivers and is a fast way of sending data from C to Erlang by
avoiding copying and serialization.

Serge

Ulf Wiger

unread,
Jul 1, 2007, 2:13:56 AM7/1/07
to erlocaml...@googlegroups.com
Of course, using driver_send_term() when sending data
from OCaml to Erlang would be nice - fast and flexible
in that it can also send to any Erlang process.

But it will only work for the case where OCaml is
linked into the erlang runtime, right?

Item 1 on the agenda is perhaps to determine the
the high-level data representation, independent of
encoding. It wouldn't have to be type-rich. Something
similar in scope to UBF(B) would probably go a long way.
http://www.sics.se/~joe/ubf/site/home.html
(Ignore the "things change frequently bit - it
hasn't for years.)

BR,
Ulf W

2007/7/1, Serge Aleynikov <sal...@gmail.com>:

Serge Aleynikov

unread,
Jul 1, 2007, 7:17:59 PM7/1/07
to erlocaml...@googlegroups.com
Ulf Wiger wrote:
> Of course, using driver_send_term() when sending data
> from OCaml to Erlang would be nice - fast and flexible
> in that it can also send to any Erlang process.
>
> But it will only work for the case where OCaml is
> linked into the erlang runtime, right?

Yes, indeed.

> Item 1 on the agenda is perhaps to determine the
> the high-level data representation, independent of
> encoding. It wouldn't have to be type-rich. Something
> similar in scope to UBF(B) would probably go a long way.
> http://www.sics.se/~joe/ubf/site/home.html
> (Ignore the "things change frequently bit - it
> hasn't for years.)

Agreed. Selecting interface method (TCP/Port/Driver/etc) is of less
importance at this time. Thinking of the data representation will also
help to get a more clear picture of capabilities of bindings in both
languages.

F Reig

unread,
Jul 2, 2007, 4:37:29 AM7/2/07
to erlocaml...@googlegroups.com
On 7/1/07, Ulf Wiger <u...@wiger.net> wrote:
>
> [...]

> Item 1 on the agenda is perhaps to determine the
> the high-level data representation, independent of
> encoding. It wouldn't have to be type-rich. Something
> similar in scope to UBF(B) would probably go a long way.
> http://www.sics.se/~joe/ubf/site/home.html
> (Ignore the "things change frequently bit - it
> hasn't for years.)

So, what do people have in mind as the kind of interface? I'm asking
from the Ocaml point of view. Option 1 is that, at the Ocaml side, you
have a type similar to this

type term =
| T_atom of string
| T_bool of bool
| T_float of float
| T_list of term list
| ...

and everything that comes from the erlang world is of this type. (Of
course, you can have variations on this, such as variants or phantom
types, but the general idea is that of tagged values.)

Option 2 is to use arbitrayr Ocaml types in the interface, such as
bool or the record type { name:string; age:int}. What cannot be
accommodated in this model is collections of arbitrary length
containing values of different types. Those would need to be
collections of tagged values.

Fermin

Ulf Wiger

unread,
Jul 2, 2007, 5:12:42 AM7/2/07
to erlocaml...@googlegroups.com
I would like to see tuples and binaries as well.
If the Ocaml side can handle Erlang binaries,
it can handle any Erlang term as an opaque
value (the Erlang side would use term_to_binary()
and binary_to_term() to pack and unpack the
value)

If the data being transfered is tagged, you
can of course send binaries as strings, and
tuples as lists.

BR,
Ulf W

2007/7/2, F Reig <fermi...@gmail.com>:

Serge Aleynikov

unread,
Jul 2, 2007, 9:17:11 AM7/2/07
to erlocaml...@googlegroups.com
One way to represent Erlang tuples could be to treat them as arrays,
whereas binaries could be represented as strings (OCaml strings can
contain 0-byte character as they don't rely on it being the
end-of-string signifier):

type term =
...
| T_tuple of term array
| T_binary of string

Why wouldn't map Erlang tuples to OCaml tuples? Say, if we have an
OCaml function that returns a tuple from Erlang, it would have to have a
fixed signature with a given number of elements as the return value:

let get_tuple : ... -> term * term

This is because OCaml statically determines the shape of the tuple, and
the only way to work with the result of such a function is by pattern
matching on its shape:

match get_tuple with
(T_atom ok, T_int i) -> ...
| (T_atom error, T_string s) -> ...;;

Alternatively we can reserve a custom tuple type "pair" for marshaling
two-value tuples, on which OCaml offers fst and snd value retrieval
functions, and use arrays for N-tuples:

Erlang -> OCaml
{Key, Value} -> ('a, 'b)
{A, B, ..., N} -> [|a; b; ...; n|]

type term =
...
| T_pair of term * term
| T_tuple of term array
...

Serge

Ulf Wiger

unread,
Jul 2, 2007, 10:44:14 AM7/2/07
to erlocaml...@googlegroups.com
2007/7/2, Serge Aleynikov <sal...@gmail.com>:

>
> Alternatively we can reserve a custom tuple type "pair" for marshaling
> two-value tuples, on which OCaml offers fst and snd value retrieval
> functions, and use arrays for N-tuples:
>
> Erlang -> OCaml
> {Key, Value} -> ('a, 'b)
> {A, B, ..., N} -> [|a; b; ...; n|]
>
> type term =
> ...
> | T_pair of term * term
> | T_tuple of term array

Yes. Then {Key,Value} lists can be conveniently
handled. I'm a big fan of proplists. (-:

BR,
Ulf W

F Reig

unread,
Jul 2, 2007, 5:32:50 PM7/2/07
to erlocaml...@googlegroups.com
On 6/30/07, Ulf Wiger <u...@wiger.net> wrote:
>
> I would like to add as an option that the Erlang Term Format
> is decoded directly in Ocaml:
>
> It shouldn't be that hard, and I think it might be faster than
> using one of the libraries.

Using Ocaml stream parsers it can be done quite succintly. Here's a
partial implementation

open Stream
open ExtString.String (* part of Extlib *)

type term =
| T_int of int32
| T_float of float


| T_atom of string
| T_bool of bool

(* etc *)

let atom_or_bool = function
| "true" -> T_bool true
| "false" -> T_bool false
| str -> T_atom str


let rec bin_term =
parser [< ''\131'; t = term; rest = empty ?? "binary term too long" >] -> t

and term =
parser
(* unsigned int8 *)
| [< ''\097'; i = uint8 >] -> T_int (Int32.of_int i)
(* int32 *)
| [< ''\098'; b0 = uint8; b1 = uint8; b2 = uint8; b3 = uint8 >] ->
(* b0 * 256^3 + b1 * 256^2 + b2*256 + b3 *)
let i = Int32.of_int (256*256*256) in
let i = Int32.mul i (Int32.of_int b0) in
let i = Int32.add i (Int32.of_int (b1*256*256 + b2*256 + b3)) in
T_int i
(* float *)
| [< ''\099'; s = string_n 31 >] ->
T_float (float_of_string s)
(* atom or bool *)
| [< ''\100'; b0 = uint8; b1 = uint8; atom = string_n b1 >] ->
atom_or_bool atom

and uint8 =
parser [< 'i >] -> int_of_char i

and string_n n =
let njunk n stream = for i = 0 to n do junk stream done in
parser [< s = npeek n; rest = njunk n >] -> implode s


I've tested it with the values you get from erlang's atom_to_binary
and it works fine.

# bin_term (Stream.of_string "\131\098\000\000\255\255");;
- : Erl_term.term = T_int 65535l

# bin_term (Stream.of_string "\131\100\000\003\097\098\099");;
- : Erl_term.term = T_atom "abc"

# bin_term (Stream.of_string "\131\100\000\005\102\097\108\115\101");;
- : Erl_term.term = T_bool false

Fermin

khigia

unread,
Nov 26, 2007, 7:33:38 AM11/26/07
to F Reig, erlocaml...@googlegroups.com
Hello everyone,
I'm reading the discussions in this group ... quite interesting in
fact! :)

I'm using Erlang since only one year, and I'm a complete newby in
OCaml ... but would love to spend some time coding on this project! My
idea is to try to do something similar to py_interface (a erlang node
implemented in python, communicating through a erlang port ... if I'm
not wrong). I have absolutely no idea of the amount of work to
replicate this in ocaml ... I guess a first step would be to reuse the
previous ocaml stream decoding and communicate through a TCP
connection.

So I have few questions ...
First, is it too ambitious as a project to learn OCaml? ;)
Second, has anybody some code already?

Any advice/guidance is welcome!

Thanks,
ludo
Reply all
Reply to author
Forward
0 new messages