tight = complete or efficient, simple?

2 views
Skip to first unread message

jsnX

unread,
Jul 12, 2007, 10:56:52 PM7/12/07
to erlocaml-discuss
Although it would be nice to have all the Erlang datatypes represented
in OCaml, it might save us a lot of trouble to go the other way round
-- to have an 'ocaml' module that allowed us to make calls into OCaml
functions and construct OCaml values (of very primitive types -- lists
of Int and so forth).

The values received would be utterly opaque to Erlang (here I am
echoing Ulf Wiger) -- the only handling allowed by Erlang would be

binary comprehensions
passing opaque values from one OCaml node to another

So maybe if the result of an OCaml function was, say, a list of bytes
that was an HTML page, Erlang would know enough to append a content-
type and pass it to the user -- but no parsing that page or anything
else on the Erlang side! OCaml, on the other hand, would have no
interface to Erlang at all -- it would just pass bytes to its master.

This approach is, I think, the shortest path to joining the strengths
of both languages -- Erlang gets to manage processes and
communication, whereas OCaml may merrily crunch away at the data
without regard for its neighbors. I think the only tricky part is the
garbage collection -- we need the OCaml processes to punt on garbage
collecting whatever data has been shipped out to Erlang, as a part of
a more general prohibition on modifying any such value. The less
Erlang and OCaml know about each other, the better.

khigia

unread,
Nov 26, 2007, 10:18:23 AM11/26/07
to jsnX, erlocaml...@googlegroups.com
In a first version, could we see the data passed between erlang and
ocaml as messages? (like erlang messages). That way, there is no GC
problem, is it?
That way, some messages can be opaque to erlang (seen as binary) but
some others could make sense (integer, float ...).
argh, this seems too simple, did I missed your point?

Jason Dusek

unread,
Nov 26, 2007, 2:11:44 PM11/26/07
to khigia, erlocaml...@googlegroups.com
On Nov 26, 2007 7:18 AM, khigia <lcoq...@gmail.com> wrote:
> In a first version, could we see the data passed between
> erlang and ocaml as messages? (like erlang messages). That
> way, there is no GC problem, is it?

That seems right, but consider this scenario:

An Erlang node manages two O'Caml programs -- one to compute
the values of a matrix from the previous values (so it's
encapsulated within an Erlang process with a heartbeat) and
one to derive certain quantities from this matrix for display.

master (erlang)
heartbeat (erlang)
calculator (o'caml)
poller (erlang)
deriver (o'caml)

So, for data to get to the deriver, we might have:

get data from calculator into heartbeat (results in a copy)
pass data from heartbeat to poller (another copy)
pass data from poller to deriver (a third copy)

The middle copy is the result of the way that Erlang handles
data passed between processes (though maybe HiPE averts this).
However, the first and last copy are due to the absence of a
shared memory approach.

> That way, some messages can be opaque to erlang (seen as
> binary) but some others could make sense (integer, float ...).
> argh, this seems too simple, did I missed your point?

Erlangers, correct me if I am wrong, but integer and float might
appear all wrong when they go from Erlang to O'Caml or the other
way -- Erlang's integer is truncated a bit, to make space for a
little tag in front, and the float type in Erlang is 4 words
long or something like that.

Really, Erlang's value proposition lay in managing the transfer
of data between O'Caml compute processes. Maybe it is best to
enforce the opacity of all inter-O'Caml messages -- we're using
Erlang for IPC and job control.

--
_jsn

Ludovic Coquelle

unread,
Nov 26, 2007, 10:06:11 PM11/26/07
to Jason Dusek, erlocaml...@googlegroups.com
OK, you're right. I got your point about multiple copy of data.
If we need to pass data between processes, we need to avoid copy when possible. I guess this imply to use the driver interface between erlang and ocaml (and deals with the 2 GC problem).

As far as making things opaque for erlang, I still think that we don't have to enforce it: if ocaml want to exchange opaque data, it can just create an erlang binary ... erlang doesn't try to interpret it. But we can keep all the erlang terms, such that if ocaml want only to give back a integer result to erlang, it still is possible. Does this make sense?

By the way, I have no experience at all with erlang drivers ... but I would like to try first to implement an erlang port.

ludo

Jason Dusek

unread,
Nov 26, 2007, 10:38:23 PM11/26/07
to Ludovic Coquelle, erlocaml...@googlegroups.com
Ludovic Coquelle <lcoq...@gmail.com> wrote:
> OK, you're right. I got your point about multiple copy of
> data. If we need to pass data between processes, we need to
> avoid copy when possible. I guess this imply to use the driver
> interface between erlang and ocaml (and deals with the 2 GC
> problem).
>
> As far as making things opaque for erlang, I still think that
> we don't have to enforce it: if ocaml want to exchange opaque
> data, it can just create an erlang binary ... erlang doesn't
> try to interpret it. But we can keep all the erlang terms,
> such that if ocaml want only to give back a integer result to
> erlang, it still is possible. Does this make sense?

If we enforce opacity, then when we do Erlskell, it will go that
much smoother. Erlang could be the Job Control Language of the
future...

However, the 'binary streams' approach requires something from
both sides. Referency structures in O'Caml can't be passed
another O'Caml node -- since they don't share memory space --
and in consequence, everything needs to be pickled to go through
Erlang. The really sharable data structures are things that can
be mmapped -- int[], structs, stuff like that.

Once data is pickled, Erlang could unpickle it -- but that means
a copy. In general, we won't be calling out to O'Caml nodes for
single Int32s -- using a library is probably a better idea in
that case.

It's been a long time since I've written any Erlang :) I'm glad
to see this thread alive again!

--
_jsn

Ludovic Coquelle

unread,
Nov 26, 2007, 10:58:34 PM11/26/07
to Jason Dusek, erlocaml...@googlegroups.com
On 11/27/07, Jason Dusek <jason...@gmail.com> wrote:
Ludovic Coquelle <lcoq...@gmail.com> wrote:
> OK, you're right. I got your point about multiple copy of
> data.  If we need to pass data between processes, we need to
> avoid copy when possible. I guess this imply to use the driver
> interface between erlang and ocaml (and deals with the 2 GC
> problem).
>
> As far as making things opaque for erlang, I still think that
> we don't have to enforce it: if ocaml want to exchange opaque
> data, it can just create an erlang binary ... erlang doesn't
> try to interpret it. But we can keep all the erlang terms,
> such that if ocaml want only to give back a integer result to
> erlang, it still is possible. Does this make sense?

If we enforce opacity, then when we do Erlskell, it will go that
much smoother. Erlang could be the Job Control Language of the
future...

Wow! I'm only at the beginning of learning OCaml, cannot see as far as using Erlang + OCaml + Haskell :)

However, the 'binary streams' approach requires something from
both sides. Referency structures in O'Caml can't be passed
another O'Caml node -- since they don't share memory space --
and in consequence, everything needs to be pickled to go through
Erlang. The really sharable data structures are things that can
be mmapped -- int[], structs, stuff like that.

What would you think of a special API for sharable data? Is this even possible?

Once data is pickled, Erlang could unpickle it -- but that means
a copy. In general, we won't be calling out to O'Caml nodes for
single Int32s -- using a library is probably a better idea in
that case.

The use case I have in mind is as follow:
Ocaml node receive some pieces of data from multiple erlang processes (could be a financial system retrieving info from lot of different services, or a log analyzer receiving info from different servers ...) and store/compute in its own data structure. Then any Erlang node could ask to the Ocaml node a computation result of the data (which may be a complex computation for which ocaml is better than erlang).
Does this make sense?

It's been a long time since I've written any Erlang :) I'm glad
to see this thread alive again!

:)

--
_jsn

G B-)

unread,
Nov 27, 2007, 10:15:25 PM11/27/07
to erlocaml-discuss


On Nov 27, 2:11 am, "Jason Dusek" <jason.du...@gmail.com> wrote:
> On Nov 26, 2007 7:18 AM, khigia <lcoque...@gmail.com> wrote:
>
> > In a first version, could we see the data passed between
> > erlang and ocaml as messages? (like erlang messages). That
> > way, there is no GC problem, is it?
>
> That seems right, but consider this scenario:
>
> An Erlang node manages two O'Caml programs -- one to compute
> the values of a matrix from the previous values (so it's
> encapsulated within an Erlang process with a heartbeat) and
> one to derive certain quantities from this matrix for display.
>
> master (erlang)
> heartbeat (erlang)
> calculator (o'caml)
> poller (erlang)
> deriver (o'caml)
>
> So, for data to get to the deriver, we might have:
>
> get data from calculator into heartbeat (results in a copy)
> pass data from heartbeat to poller (another copy)
> pass data from poller to deriver (a third copy)
>
> The middle copy is the result of the way that Erlang handles
> data passed between processes (though maybe HiPE averts this).
> However, the first and last copy are due to the absence of a
> shared memory approach.
I am a huge fan of Michael A Jackson's (the programmer, not singer of
Thriller) two rules of Program Optimisation (http://en.wikiquote.org/
wiki/Michael_A._Jackson):
The First Rule of Program Optimisation: Don't do it.
The Second Rule of Program Optimisation - For experts only: Don't do
it yet

It would be extremely useful to have something that works, is stable,
and is relatively easy to test and measure.
I had imagined that an external process (OCaml program) with a subset
of the port interface is the easiest way to start.
We can do an awful lot of experiments, and development with that.
Optimising that, by turning it into a driver, and sharing Erlang's GC
could come if we discover that there is significant value to be
gained.

>
> > That way, some messages can be opaque to erlang (seen as
> > binary) but some others could make sense (integer, float ...).
> > argh, this seems too simple, did I missed your point?
>
> Erlangers, correct me if I am wrong, but integer and float might
> appear all wrong when they go from Erlang to O'Caml or the other
> way -- Erlang's integer is truncated a bit, to make space for a
> little tag in front, and the float type in Erlang is 4 words
> long or something like that.
>
> Really, Erlang's value proposition lay in managing the transfer
> of data between O'Caml compute processes. Maybe it is best to
> enforce the opacity of all inter-O'Caml messages -- we're using
> Erlang for IPC and job control.
I agree with you.
I think this has the side-benefit that the amount of the port protocol
that must be implemented gets even smaller - all data is passed to
Erlang as a single binary + PID, and the pickled OCaml stuff needn't
be understood initially. Once this is working, we could choose how to
open up the messages.

GB

G B-)

unread,
Nov 27, 2007, 10:22:47 PM11/27/07
to erlocaml-discuss


On Nov 27, 10:06 am, "Ludovic Coquelle" <lcoque...@gmail.com> wrote:
> OK, you're right. I got your point about multiple copy of data.
> If we need to pass data between processes, we need to avoid copy when
> possible. I guess this imply to use the driver interface between erlang and
> ocaml (and deals with the 2 GC problem).
I don;t feel we should worry about the cost of copying until we have
something working, and a port interface seems much easier.
>
> As far as making things opaque for erlang, I still think that we don't have
> to enforce it: if ocaml want to exchange opaque data, it can just create an
> erlang binary ... erlang doesn't try to interpret it. But we can keep all
> the erlang terms, such that if ocaml want only to give back a integer result
> to erlang, it still is possible. Does this make sense?
>
> By the way, I have no experience at all with erlang drivers ... but I would
> like to try first to implement an erlang port.
I have no experience with Erlang drivers either, but a port interface
seems to involve a lot fewer problems. It isn't 'just' getting two
gc's to co-operate, there are issues about possible internal name
clashes, file management, compatible vesions of libraries, signal
handlers, ...

A separate process, using a subset of the port interface (assuming
opaque data initially) seems to me to be an easier path to success
than starting with a driver.
(At my astronomy club, a guy asked what is the quickest way to build
his own big telescope, the answer was build a small one first, you'll
learn most of the problems more quickly).

GB
Reply all
Reply to author
Forward
0 new messages