Assume we define an Ocaml representation of erlang terms like the
following:
type term =
| T_atom of string
| T_bool of bool
| T_float of float
| T_list of term list
| ...
Assume we want an implementation where erlang will communicate with
ocaml via TCP/IP sending terms as byte streams, instead of sharing the
same address space. Then, I can think of two ways we can convert the
terms encoded as binaries to and from the ocaml representation: 1) via
ocaml's FFI, using the erl_interface library that is part of OTP and
2) implementing the conversion in Ocaml, using jinterface as
inspiration. I've interfaced a dynamically typed language that
provides a C interface with Ocaml before, using the FFI, but I haven't
used erl_interface nor jinterface before, Maybe someone who has could
indicate which of my suggestions above would be simpler (or provide a
better one).
Fermin
1. Via some network transport (e.g. TCP)
2. Port program
3. Driver
I haven't done any benchmarks between #1 & #2, but as far as I recall #2
is about 10 times slower than #3. In terms of complexity 1 & 2 are on
the same order, whereas #3 is more difficult to implement and test, but
if you are looking for speed, it would be a goo incentive.
I would stay away from jinterface for any of these tasks, and stick to
ei interface library for #2 and a combination of ei and erl_interface
for #1. #3 requires a bit different approach (check out inet_drv.c for
examples of how to marshal data by avoiding extra copying).
On the Ocaml side for #1 & #2 you can use ei interface library and
create C -> Ocaml interface wrappers to deal with Ocaml values. The
most difficult term to encode would probably be a tuple as Erlang and
Ocaml have a different semantic representation of a tuple. In Ocaml a
tuple cannot be distinguished from an array on the C side (as Ocaml
doesn't quite have RTTI aside from most generic tags for strings,
doubles, etc), and on the Ocaml side you don't have a luxury of
element(N, Tuple) and size(Tuple) calls that are available in Erlang.
Serge
It shouldn't be that hard, and I think it might be faster than
using one of the libraries.
BR,
Ulf W
2007/6/30, Serge Aleynikov <sal...@gmail.com>:
Is the Erlang Term Format documented somewhere, or do we have to
reverse-engineer it by reading the implementations of ei_decode_* and
ei_encode_* ?
Fermin
It is documented, at least as a text file in the source code
distribution (erts/emulator/internal_doc/erl_ext_dist.txt)
You can find one version online here:
http://software.pupeno.com/pkg/breezy/erlang/erlang/erts/emulator/internal_doc/erl_ext_dist.txt
That seems to be the R10B-0 version, but the format
doesn't change much.
The file also contains descriptions of the Erlang Port Mapper Daemon
Protocol and the Distribution Protocol.
Distel (Distributed Emacs Lisp) implements this protocol directly,
and talks to Erlang nodes that way:
http://distel.googlecode.com/svn/trunk/elisp/erlext.el
BR,
Ulf W
- Erlang Term Format
- External Binary Format
- Driver Term Format
I think the first one is what's represented by ETERM and what's used in
erl_interface (see erl_interface user's guide). The second one is the
result of serialization of internal representation of Erlang terms using
term_to_binary() and binary_to_term() functions on Erlang side and ei
interface on C side. Finally, the Driver Term Format is documented in
the erts / erl_driver section of documentation. This one is used when
writing drivers and is a fast way of sending data from C to Erlang by
avoiding copying and serialization.
Serge
But it will only work for the case where OCaml is
linked into the erlang runtime, right?
Item 1 on the agenda is perhaps to determine the
the high-level data representation, independent of
encoding. It wouldn't have to be type-rich. Something
similar in scope to UBF(B) would probably go a long way.
http://www.sics.se/~joe/ubf/site/home.html
(Ignore the "things change frequently bit - it
hasn't for years.)
BR,
Ulf W
2007/7/1, Serge Aleynikov <sal...@gmail.com>:
Yes, indeed.
> Item 1 on the agenda is perhaps to determine the
> the high-level data representation, independent of
> encoding. It wouldn't have to be type-rich. Something
> similar in scope to UBF(B) would probably go a long way.
> http://www.sics.se/~joe/ubf/site/home.html
> (Ignore the "things change frequently bit - it
> hasn't for years.)
Agreed. Selecting interface method (TCP/Port/Driver/etc) is of less
importance at this time. Thinking of the data representation will also
help to get a more clear picture of capabilities of bindings in both
languages.
So, what do people have in mind as the kind of interface? I'm asking
from the Ocaml point of view. Option 1 is that, at the Ocaml side, you
have a type similar to this
type term =
| T_atom of string
| T_bool of bool
| T_float of float
| T_list of term list
| ...
and everything that comes from the erlang world is of this type. (Of
course, you can have variations on this, such as variants or phantom
types, but the general idea is that of tagged values.)
Option 2 is to use arbitrayr Ocaml types in the interface, such as
bool or the record type { name:string; age:int}. What cannot be
accommodated in this model is collections of arbitrary length
containing values of different types. Those would need to be
collections of tagged values.
Fermin
If the data being transfered is tagged, you
can of course send binaries as strings, and
tuples as lists.
BR,
Ulf W
2007/7/2, F Reig <fermi...@gmail.com>:
type term =
...
| T_tuple of term array
| T_binary of string
Why wouldn't map Erlang tuples to OCaml tuples? Say, if we have an
OCaml function that returns a tuple from Erlang, it would have to have a
fixed signature with a given number of elements as the return value:
let get_tuple : ... -> term * term
This is because OCaml statically determines the shape of the tuple, and
the only way to work with the result of such a function is by pattern
matching on its shape:
match get_tuple with
(T_atom ok, T_int i) -> ...
| (T_atom error, T_string s) -> ...;;
Alternatively we can reserve a custom tuple type "pair" for marshaling
two-value tuples, on which OCaml offers fst and snd value retrieval
functions, and use arrays for N-tuples:
Erlang -> OCaml
{Key, Value} -> ('a, 'b)
{A, B, ..., N} -> [|a; b; ...; n|]
type term =
...
| T_pair of term * term
| T_tuple of term array
...
Serge
Yes. Then {Key,Value} lists can be conveniently
handled. I'm a big fan of proplists. (-:
BR,
Ulf W
Using Ocaml stream parsers it can be done quite succintly. Here's a
partial implementation
open Stream
open ExtString.String (* part of Extlib *)
type term =
| T_int of int32
| T_float of float
| T_atom of string
| T_bool of bool
(* etc *)
let atom_or_bool = function
| "true" -> T_bool true
| "false" -> T_bool false
| str -> T_atom str
let rec bin_term =
parser [< ''\131'; t = term; rest = empty ?? "binary term too long" >] -> t
and term =
parser
(* unsigned int8 *)
| [< ''\097'; i = uint8 >] -> T_int (Int32.of_int i)
(* int32 *)
| [< ''\098'; b0 = uint8; b1 = uint8; b2 = uint8; b3 = uint8 >] ->
(* b0 * 256^3 + b1 * 256^2 + b2*256 + b3 *)
let i = Int32.of_int (256*256*256) in
let i = Int32.mul i (Int32.of_int b0) in
let i = Int32.add i (Int32.of_int (b1*256*256 + b2*256 + b3)) in
T_int i
(* float *)
| [< ''\099'; s = string_n 31 >] ->
T_float (float_of_string s)
(* atom or bool *)
| [< ''\100'; b0 = uint8; b1 = uint8; atom = string_n b1 >] ->
atom_or_bool atom
and uint8 =
parser [< 'i >] -> int_of_char i
and string_n n =
let njunk n stream = for i = 0 to n do junk stream done in
parser [< s = npeek n; rest = njunk n >] -> implode s
I've tested it with the values you get from erlang's atom_to_binary
and it works fine.
# bin_term (Stream.of_string "\131\098\000\000\255\255");;
- : Erl_term.term = T_int 65535l
# bin_term (Stream.of_string "\131\100\000\003\097\098\099");;
- : Erl_term.term = T_atom "abc"
# bin_term (Stream.of_string "\131\100\000\005\102\097\108\115\101");;
- : Erl_term.term = T_bool false
Fermin