Btw, I see it kept the name "remote" after all. So should we stop saying
"Cloud Haskell" then?
- Dylan
> I think the package name "remote" is a bit unspecific/generic to be honest.
I tend to agree, but it is better than "Cloud Haskell".
Erik
--
----------------------------------------------------------------------
Erik de Castro Lopo
http://www.mega-nerd.com/
... It does have an Erlang-based heritage. What about something
explicitly referencing Erlang?
From: Ryan Newton <rrne...@gmail.com>Other alternatives? Jeff is open minded so speak now or forever hold
To: parallel...@googlegroups.com
Sent: Thu, Oct 27, 2011 14:19:45 GMT+00:00
Subject: Re: Cloud Haskell now on Hackage
your peace ;-).
... It does have an Erlang-based heritage. What about something
explicitly referencing Erlang?
On Thu, Oct 27, 2011 at 6:33 AM, Erik de Castro Lopo
From: Johan Tibell <johan....@gmail.com>
To: parallel...@googlegroups.com
Sent: Thu, Oct 27, 2011 15:28:38 GMT+00:00
Subject: Re: Cloud Haskell now on Hackage
Not “Network” – it’s perfectly sensible to run this on a multicore. I could live with “Distributed” though.
Not “Cloud” – too buzzwordy, and everyone disagrees about what “cloud” means. (some people really dislike the use of the word “cloud” in this context)
“Actors” – that gets across the message-passing bit, but not the distributed bit. There’s already an “actor” package, FWIW.
(I’ve written a lot more email than code for Cloud Haskell, so my opinion should be weighted appropriately :-)
Cheers,
Simon
Erlang doesnt talk about "actors" though. They talk about spawning processes (which can send/receive...).
We need a name which wont be conflated with threads/processes, or Erlang processes...
Source: I write Erlang.
Not “Network” – it’s perfectly sensible to run this on a multicore. I could live with “Distributed” though.
Not “Cloud” – too buzzwordy, and everyone disagrees about what “cloud” means. (some people really dislike the use of the word “cloud” in this context)
“Actors” – that gets across the message-passing bit, but not the distributed bit. There’s already an “actor” package, FWIW.
Yes, I was thinking of Distributed too.
Our existing concurrency libraries have things like:
Control.Concurrent
Control.Concurrent.STM
Control.Concurrent.MVar
etc
In keeping with that, I'd go with
Control.Distributed.*
> Distributed.Actors
> or
> Distributed.Processes
Yes, and I'd be happy with * being Actors or Processes.
So, specifically then, my suggestion is:
Control.Distributed.Actors
or
Control.Distributed.Processes
Other people working on other similar distributed memory stuff can pick
other names in Control.Distributed.*
> And factor out the task stuff, providing just the messaging layer.
Right.
Control.Distributed.Task
Then for a package name, how about "distributed-actors"?
Duncan
So, specifically then, my suggestion is:
Control.Distributed.Actors
or
Control.Distributed.Processes
Ok... since this thread is rolling maybe at the same time someone can tell me what to name a package for threadsafe mutable deques:Control.Concurrent.Deque (like Chan)Data.DequeData.Deque.LockFreeData.Concurrent.Deque??
This is the problem with ontologies. :)
Definitely under Data. I say. Data.Concurrent. sounds like a good namespace for mutable concurrency safe data structures (immutable data structures are already concurrency safe.)
Putting Concurrent before Deque allows us to group different concurrent data structures.
Exciting! Is there a source repo somewhere? It would be nice if you
could record[1] that in the cabal file.
Regards,
Bas
[1] http://www.haskell.org/cabal/users-guide/#source-repositories
Never mind: I just noticed the github repos
(https://github.com/jepst/CloudHaskell) in the paper.
> It would be nice if you could record that in the cabal file.
You just got a pull request to do just that ;-)
Bas
What do you think of the idea of separating the task layer abstraction
into a separate package?
Jeff
Isn't a defining characteristic here that the same binary runs on all nodes? That is very un-erlangish.
Alexander
Cloud Haskell is now available from Hackage!
You are correct. We welcome proposals that would help mitigate this limitation.
Jeff
There are two different questions here:
1. Can you modify the code of the program while it is running?
2. Even if the program is stable, can nodes have different binaries?
Erlang answers "yes" to (1) and hence of course "yes" to (2). Of course, modifying the program while it is running can lead to all manner of strange and unpredictable errors.
In a typed language (1) is much, much harder. What does soundness mean? There is research on this question (look at Acute and Mike Hicks's work) but we made no attempt to address it in Cloud Haskell. So for (1) we answer firmly NO. That's a limitation, but it is one that is very hard to lift.
On the other hand (2) is much more nuanced. If you send a function closure over the network, you surely do not want to send the transitive closure of all the code that can be executed starting at that closure. That could be megabytes of code! No, surely you will imagine a global "code base", and pointers into it (call them labels), like "the code for function f". Now when sending a closure, just send the *label* for its code. If a node wants to run that code, it can fetch it from the global codebase; and cache it thereafter.
Running the same binary is just an extreme version in which each node pre-fetches the entire code base. But it's just an implementation strategy. In principle it can all be fetched lazily.
So in Cloud Haskell (2) is very much an implementation tactic. We chose something simple for Day 1, that's all.
Simon
If we could name ABIs globally, then a code label can be an ABI identifier plus a function name. Then, when a node receives a code label that it doesn't already have, it can "cabal install" the package and dynamically link the code.
This relies on having stable ABIs, which is something we've talked about in the past. Stable ABIs are useful for other things too - e.g. upgrading packages without breaking packages that depend on them.
Cheers,
Simon
I am not sure I fully get what you are both talking about.
SPJ, in solving 2), if you change any function, then it is possible that the transitive closure of code that depends on the changed function has changed. The same is true for all functions if the compiler changed, or if libc was upgraded. Think in terms of the git hash of the code+build environment. The *meaning* might have changed.
Code size it not really an issue. Collecting all code that runs a distributed system is a solved problem if one can disregard time. It only requires linking a huge number of libraries. Upgrading is the interesting problem. When upgrading we are replacing code, and introducing new code that never existed in the system to begin with, much like what you categorize as 1).
To deal with the issue of changed "meaning", an engineering solution is to use stable interfaces. The meaning of a stable interface is by definition the same for a client, even if the implementation of that interface is upgraded. Thinking in terms of git hash, the stable interface has a fixed hash.
Here's my quick and dirty design for stable identifiers and interfaces for a distributed haskell based on thinking like git. Let's define two types of identifiers: Id = CodeId | InterfaceId
- CodeId: Sha-256 hierarchical hash of the code and the Id of its dependencies (the name would not be hashed).
- InterfaceId: Sha-256 hash of the interface definition; types and names.
Assuming ghc would calculate these Ids and store them with somewhere as metadata, we could do the following operations:
- After building a new binary, would the remote calls it is trying to execute be serviceable by running nodes at a given service level (i.e. qps - enough % of available nodes support the remote calls)? This should and can be answered before bringing the new build up in a production environment. If the new binary is trying to do a remote call to something that is not an InterfaceId, it might or might not work, depending on the difference between the binary and what is available on the network (are the CodeIds the same? They could be, by chance.). If remote calls are only done to InterfaceIds, then there is a fair chance that it will work out.
- Which InterfaceIds and CodeIds do not exist on the network anymore? The InterfaceIds can be "garbage-collected" from the code-base. The CodeIds can be garbage-collected form the "artifact/binary code repository".
GHC would be the component that could produce these Ids, but I do think they solve the problem of having a unique Id attached to a piece of behavior.
Alexander
According to Dylan, the Data.* thing was a different discussion.
So as I understand it the main question was wether we use a top level
Distributed.* or Control.Distributed.* I'm not sure if we decided
between .Process and .Actor, but if we decided Process that's fine.
I've been advocating using module names that matche our existing
standard module names for concurrency, sice this is "just" distributed
concurrency. The base package has Control.Concurrent.*, and I argue we
should pick Control.Distributed.*
Hence Control.Distributed.Process
> What do you think of the idea of separating the task layer abstraction
> into a separate package?
Yes. These layers should not be tied together in one package. There is
plenty of scope in the design space for alternatives to the Task layer
and these might also be built on top of the Process layer. It also
simplifies the API documentation to keep them separate.
Duncan
What do you think of the idea of separating the task layer abstraction
into a separate package?
I guess the nominees would be this:
Control.Distributed.Actor
Control.Distributed.Process
With the former having my vote. Likewise, Duncan put in another vote for splitting out the task layer into separate package, which I also support. Hopefully if no new strong opposition arises these measures can both go forward...
-Ryan
Are we near consensus then? I vote for Actor over Process just because Process has another strong meaning (System.Process) and those will pollute google search results.
I guess the nominees would be this:
Control.Distributed.Actor
And it's not as if we can't change the haskell-mpi package.
Duncan
If there are no further opinions, I'll make this change in the next version.
What about package/product name?
Jeff
FWIW, I'd be happy with just "Control.Distributed". By analogy, we have
"Control.Concurrent", not "Control.Concurrent.Thread".
Cheers,
Simon
It's true, but in the case of concurrency there are fewer design choices
and Control.Concurrent is what is provided by "Concurrent Haskell",
implemented directly in the RTS and exported by the base package. The
difference here is the higher chance of other implementations /
approaches in the same Control.Distributed.* area.
--
Duncan Coutts, Haskell Consultant
Well-Typed LLP, http://www.well-typed.com/
Sure, but module names do not have to be globally unique. It's unlikely
that you'd need more than one kind of Control.Distributed in any given
program.
I just have the feeling that "Control.Distributed.Process" is like
giving 3 digits of precision when you only care about 2, and making you
type in the extra stuff every time even though it doesn't change. Maybe
it's just me.
Cheers,
Simon
Cheers,
Mischa