In a nutshell: the only substantial requirement is that the remote
table and message serialization have to be the same in all binaries
that communicate with each other. This means that, with care, you can
use the same library code in different binaries, and they will be able
to send messages and closures.
A caveat: one should be careful to avoid using any architecture
dependent types (e.g. Int) in messages, since that type could have
different sizes on 32-bit and 64-bit machines. Use fixed-size types
instead, like Int32.
These requirements may change if the language is extended to natively
support Static.
For many of the problems I'm interested in, having to have the same binary on all nodes would be a deal-breaker. I think the same would be true of anyone wanting to build loosely-coupled systems.
Many, many issues hide here. If you want to send a code pointer (and we do – that is what the static thing is all about), you can
A. Send a code pointer (cheap). But this means that the same code has to live at the other end, so you can talk about it by name; and that’s precisely what you don’t want.
B. Send the code itself. And the code that it refers to, and so on, transitively.
C. A combination of (A) and (B). Start with (B), but cache the code at the remote end, so you don’t send the same code twice.
Under a vision of (C) you’d start with all-empty nodes, except one. When it spawns processes elsewhere, it must send the code (ALL the code, transitively). But the remote nodes cache it. Over time they all get all the code (well, all the code they need). World peace breaks out.
But it’s complicated to implement this vision:
· What do we mean by “same code”? Probably a fingerprint of the transitive closure of the code. Really the same all the way to the leaves.
· How do we ship code? Haskell source? Bytecode? Object code? Core lambda code?
In the end I think this is the Right Thing. But it’ll be some work to get there.
Simon
But it’s complicated to implement this vision:· What do we mean by “same code”? Probably a fingerprint of the transitive closure of the code. Really the same all the way to the leaves.· How do we ship code? Haskell source? Bytecode? Object code? Core lambda code?
In the end I think this is the Right Thing. But it’ll be some work to get there.
For many of the problems I'm interested in, having to have the same binary on all nodes would be a deal-breaker. I think the same would be true of anyone wanting to build loosely-coupled systems.Many, many issues hide here. If you want to send a code pointer (and we do – that is what the static thing is all about), you canA. Send a code pointer (cheap). But this means that the same code has to live at the other end, so you can talk about it by name; and that’s precisely what you don’t want.B. Send the code itself. And the code that it refers to, and so on, transitively.C. A combination of (A) and (B). Start with (B), but cache the code at the remote end, so you don’t send the same code twice.Under a vision of (C) you’d start with all-empty nodes, except one. When it spawns processes elsewhere, it must send the code (ALL the code, transitively). But the remote nodes cache it. Over time they all get all the code (well, all the code they need). World peace breaks out.But it’s complicated to implement this vision:· What do we mean by “same code”? Probably a fingerprint of the transitive closure of the code. Really the same all the way to the leaves.· How do we ship code? Haskell source? Bytecode? Object code? Core lambda code?In the end I think this is the Right Thing. But it’ll be some work to get there.
On Jun 21, 2013, at 5:43 AM, Simon Peyton-Jones <sim...@microsoft.com> wrote:In the end I think this is the Right Thing. But it’ll be some work to get there.Absolutely. I should have mentioned this in my previous post, but sending functions from node to node also implies a tight coupling, for all the reasons you mentioned.
Erlang does not get around this either [1].
I work on two large distributed Erlang applications (Riak and Riak CS), and (for the most part) we never send functions around.
All of this being said, there certainly are applications where the advantages you get from this tighter coupling are worth the downsides.
In the end, I think there are lots of useful applications that can be build with Cloud Haskell just with message (no functions in them) passing.
I'm not suggesting any changes, just trying to make a point that there are strong use-cases for just wanting some set of base types in common between nodes.
Hey - nice to see another Erlang programmer around here! I'd be well interested to get some feedback on ManagedProcess and Supervisor APIs - and Service.Registry (i.e., our version of gproc) when it's finished - from someone who's used their progenitors... See the "development" and "procreg" branches of https://github.com/haskell-distributed/distributed-process-platform/.
for a given application that "needs to send closures", those closures likely constitute a closed DSL with a *Deep* embedding, and thus a serialized AST that can be interpreted by the recipient is actually perfectly valid! Considering that in many use cases network latency is likely a greater bottleneck source than running a tiny interpretor over a DSL with a first order AST rather than using compiled code! Such an approach also SOLVES the tight coupling of binary versions issue!
I believe that your point is this. Rather than send a function “foo”, enumerate all the functions you want to send in a data type
data Fun = Foo | Bar Int | Woz Bool
and then interpret them at the far end
interpret :: Fun -> IO ()
interpret Foo = foo
interpret (Bar i) = bar i
interpret (Woz b) = woz b
This is fine when there is a fixed finite set of functions, and for many applications that may be the case. It amounts to manual defunctionalisation; perhaps inconvenient but no more than that.
The time it *isn’t* the case is when you are spawning a process on a remote node. Then you need to give the function to run remotely, and at this point you are talking to the underlying infrastructure so you can’t have a per-application data type.
An alternative would be to somehow specify a process to be run, automatically, on every node. Then ‘spawn’ would be realised as sending a message to that process. Since you get to specify the process, it can interpret a per-application data type of all the processes you want to run. It’s arguable that this should be THE way that CH supports spawning, so as to give you maximal control. (For example the recipient process could decide that load was too heavy on that node, and forward on to another.) Letting the application decide policy while the infrastructure provides mechanism seems like a good plan.
Simon
From: Carter Schonwald [mailto:carter.s...@gmail.com]
Sent: 30 June 2013 08:49
To: dun...@well-typed.com
Cc: reidd...@gmail.com; Simon Peyton-Jones; parallel...@googlegroups.com; jer...@jeremyhuffman.com
Subject: Re: Each node running the same executable?
Theres also another point in the design space that seems to not be mentioned:
I believe that your point is this. Rather than send a function “foo”, enumerate all the functions you want to send in a data type
data Fun = Foo | Bar Int | Woz Bool
and then interpret them at the far end
interpret :: Fun -> IO ()
interpret Foo = foo
interpret (Bar i) = bar i
interpret (Woz b) = woz b
This is fine when there is a fixed finite set of functions, and for many applications that may be the case. It amounts to manual defunctionalisation; perhaps inconvenient but no more than that.
An alternative would be to somehow specify a process to be run, automatically, on every node. Then ‘spawn’ would be realised as sending a message to that process. Since you get to specify the process, it can interpret a per-application data type of all the processes you want to run. It’s arguable that this should be THE way that CH supports spawning, so as to give you maximal control.
(For example the recipient process could decide that load was too heavy on that node, and forward on to another.) Letting the application decide policy while the infrastructure provides mechanism seems like a good plan.
It's worth noting that this is basically what a gen server (i.e., manage process) does, though without the single data type restriction, and with explicit support for rpc or throw-away (cast) interactions.An alternative would be to somehow specify a process to be run, automatically, on every node. Then ‘spawn’ would be realised as sending a message to that process. Since you get to specify the process, it can interpret a per-application data type of all the processes you want to run. It’s arguable that this should be THE way that CH supports spawning, so as to give you maximal control.
It's quite a heavy restriction for the general case though. And you l'd have to write the interpreting loop each time, which is error prone as well as tedious. Plus consumers (wanting to spawn) wouldn't have any semantic guarantees, so unless they were written by the same person, chaos could ensue. :)(For example the recipient process could decide that load was too heavy on that node, and forward on to another.) Letting the application decide policy while the infrastructure provides mechanism seems like a good plan.
I agree with that tenet, but was thinking of approaching it differently. Managed processes already take a declarative (policy based) approach to individual server processes with regards unexpected traffic and server side error handling. Supervisors move responsibility for error handling and recovery further up the chain. The service API, which I'm working on now, encodes other policies such as service interdependency, QoS, addressing and so on, into a framework that takes the drudgery out of wiring supervisor hierarchies and ensuring service components start in the right order, are available at the right time and can be located easily across nodes. In order to manage services across nodes, another API (currently dubbed Service.Execution) will provide pre-packaged tools for load regulation and traffic shaping,
I’m afraid I don’t understand the question. Maybe someone else can help or you can re-ask? I’m racing towards the POPL deadline at the moment.
S
I think all that's being suggested is that rather than sending actual object code you would send an AST and interpret it at the other end. At its simplest the AST is a closure (function + arguments, which is what we have now), but the AST could be much richer than this, and application-specific, in principle. You could send a subset of HsSyn, for example, so long as the Names are known at the far end - so you agree on some baseline primitives, such as particular versions of the core packages.
We do already have something of this flavour in the Static implementation in distributed-process: https://github.com/haskell-distributed/distributed-static/blob/master/src/Control/Distributed/Static.hs
Cheers,
Simon
On 04/07/13 17:00, Simon Peyton-Jones wrote:
I’m afraid I don’t understand the question. Maybe someone else can help*From:*Carter Schonwald [mailto:carter.schonwald@gmail.com]
or you can re-ask? I’m racing towards the POPL deadline at the moment.
S
*Sent:* 03 July 2013 23:16
*To:* Simon Peyton-Jones
*Subject:* Re: Each node running the same executable?
Simon, Could you explain the latter position more?
I don't understand how thats different from having a suitable
"skeleton" / baseline EDSL for the alluded to things (and i'm not sure
what those alluded to services may or may not be that you're referring to).
Thanks!
-Carter
On Mon, Jul 1, 2013 at 11:02 AM, Simon Peyton-Jones
<sim...@microsoft.com <mailto:sim...@microsoft.com>> wrote:
for a given application that "needs to send closures", those
closures likely constitute a closed DSL with a *Deep* embedding, and
thus a serialized AST that can be interpreted by the recipient is
actually perfectly valid! Considering that in many use cases
network latency is likely a greater bottleneck source than running
a tiny interpretor over a DSL with a first order AST rather than
using compiled code! Such an approach also SOLVES the tight coupling
of binary versions issue!
I believe that your point is this. Rather than send a function
“foo”, enumerate all the functions you want to send in a data type
data Fun = Foo | Bar Int | Woz Bool
and then interpret them at the far end
interpret :: Fun -> IO ()
interpret Foo = foo
interpret (Bar i) = bar i
interpret (Woz b) = woz b
This is fine when there is a fixed finite set of functions, and for
many applications that may be the case. It amounts to manual
defunctionalisation; perhaps inconvenient but no more than that.
The time it **isn’t** the case is when you are spawning a process on
a remote node. Then you need to give the function to run remotely,
and at this point you are talking to the underlying infrastructure
so you can’t have a per-application data type.
An alternative would be to somehow specify a process to be run,
automatically, on every node. Then ‘spawn’ would be realised as
sending a message to that process. Since you get to specify the
process, it can interpret a per-application data type of all the
processes you want to run. It’s arguable that this should be THE
way that CH supports spawning, so as to give you maximal control.
(For example the recipient process could decide that load was too
heavy on that node, and forward on to another.) Letting the
application decide policy while the infrastructure provides
mechanism seems like a good plan.
Simon
*Sent:* 30 June 2013 08:49
*To:* dun...@well-typed.com <mailto:dun...@well-typed.com>
*Cc:* reidd...@gmail.com <mailto:reidd...@gmail.com>; Simon
Peyton-Jones; parallel-haskell@googlegroups.com
<mailto:parallel-haskell@googlegroups.com>; jer...@jeremyhuffman.com
<mailto:jeremy@jeremyhuffman.com>
*Subject:* Re: Each node running the same executable?
Theres also another point in the design space that seems to not be
mentioned:
for a given application that "needs to send closures", those
closures likely constitute a closed DSL with a *Deep* embedding, and
thus a serialized AST that can be interpreted by the recipient is
actually perfectly valid! Considering that in many use cases
network latency is likely a greater bottleneck source than running
a tiny interpretor over a DSL with a first order AST rather than
using compiled code! Such an approach also SOLVES the tight coupling
of binary versions issue!
with respect to the matter of when you want to send a code fragment
over the network and its execution is performance sensitive, I've
some ideas / thoughts on how to support that which tie into some
related Ideas i have for doing runtime code generation and exposing
the resulting code as normal haskell values at runtime, without
needing to patch the GHC RTS at all!
NB: I'm still a month or two away from having the time to start
doing the basic runtime code gen experiments, but my current
experiments that relate to this make me pretty optimistic.
cheers
-Carter
On Thu, Jun 27, 2013 at 11:52 AM, Duncan Coutts
wrote:
On Fri, 2013-06-21 at 10:32 -0400, Reid Draper wrote:
> On Jun 21, 2013, at 5:43 AM, Simon Peyton-Jones
> <sim...@microsoft.com <mailto:sim...@microsoft.com>> wrote:
> > But it’s complicated to implement this vision:
> > · What do we mean by “same code”? Probably a
fingerprint of
> the transitive closure of the code. Really the same all the way to
> the leaves.
> > · How do we ship code? Haskell source? Bytecode? Object
> code? Core lambda code?
> >
> > In the end I think this is the Right Thing. But it’ll be some work
<mailto:parallel-haskell%2Bunsu...@googlegroups.com>.send an email to parallel-haskell+unsubscribe@googlegroups.com
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google
Groups "parallel-haskell" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to parallel-haskell+unsubscribe@googlegroups.com.