Re: nsend* and register for remote processes

18 views
Skip to first unread message

Tim Watson

unread,
Jan 13, 2015, 6:29:03 AM1/13/15
to Vershilov, Alexander, cloud-haskel...@googlegroups.com, distribut...@googlegroups.com, Facundo Domínguez, Mathieu Boespflug, Edsko de Vries, Duncan Coutts
Hi all,

Looping in the folks at Well-Typed and the google groups as this isn't a discussion we should continue in isolation.

So yes, Erlang does have different rules and semantics when it comes to registered/named processes. Cloud Haskell, at the point where I picked it up, had disallowed local (name) registration of remote processes and the nSendRemote primitive is evidence of this. Whilst Facundo is correct in pointing out that Erlang's unified semantics support remote process registration on a local node, I do not believe that the Cloud Haskell semantics do - see http://haskell-distributed.github.io/static/semantics.pdf for details. Have I misunderstood this???

Now, whilst I do not object to changing distributed-process to support this in principle, it concerns me that we will try to then build reliable distributed algorithms based upon it, and the semantics for it are undefined. It also introduces the need to monitor processes on remote nodes whenever they're registered, or risk the registry filling up with garbage pids that are no longer valid on their home node. That seems like a big ugly mess to me.

Alexander mentioned gproc - erlang's extended process registry. We have one of these too, which currently runs only on a local node but *does* support adding remote pids - https://github.com/haskell-distributed/distributed-process-registry

Now, this registry supports many of the features of gproc and was in fact modelled on gproc's design. It does *not* however provide a clustered/distributed view, because I do not see that as it's core purpose. My idea when writing distributed-process-registry was that people would add their own clustering/distributed mechanism on top of it, running the registry process on multiple nodes and adding their own application specific coordination layer on top of that. Such a coordination layer could be based on Control.Distributed.Process.Global or a paxos-esque distributed transaction mechanism or something else. Whatever else really, since it's up to the user to decide what semantics they want to add for coordinating registered names across multiple nodes.

Facundo/Mathieu, Alexander - does this make sense?

Edsko/Duncan, anything to add about the reasons why you originally disallowed registration of remote processes in the local registry? Personally I think it's a bad idea, but I'm always up for being told I'm wrong. :)

Cheers,
Tim


On 13 January 2015 at 07:21, Vershilov, Alexander <alexander...@tweag.io> wrote:
Hi, Tim.

I have spent some time on trying to understand Erlang situation.

And have found following things:

1. Register: http://www.erlang.org/doc/man/erlang.html#register-2

Failure: badarg if PidOrPort is not an existing, local process or port, if RegName is already in use,
if the process or port is already registered (already has a name), or if RegName is the atom undefined.

It tells that it's not possible to register a Pid or Port in local registry if process is remote.

2. Send: http://erlang.org/doc/reference_manual/expressions.html#id78439

Sends the value of Expr2 as a message to the process specified by Expr1. The value of Expr2
is also the return value of the expression.

Expr1 must evaluate to a pid, a registered name (atom) or a tuple {Name,Node},
where Name is an atom and Node a node name, also an atom.

  • If Expr1 evaluates to a name, but this name is not registered, a badarg run-time error will occur.
  • Sending a message to a pid never fails, even if the pid identifies a non-existing process.
  • Distributed message sending, that is if Expr1 evaluates to a tuple {Name,Node} (or a pid located at another node), also never fails.

3. Erlang misses a primitive for sending a message by the registry name to the remote process.

4. Most OTP services accept 3 forms on the destination:

        * Pid

        * (local, LocalName::atom())

        * (global, GlobalName::atom())

5. in order to have global registry erlang people use gproc.

6. In future uniform erlang semantics  remote processes are allowed in local registry and send, as were

pointed out by Facundo.

However, for uniformity, in this semantics names can be registered for remote processes (i.e., register(name,pid)
does not fail if pid is a remote process), and registering a local process at a remote node is supported too (using
the operation register(node,name,pid)). As a consequence, when a message is sent to a remote node using the syntax
{atom,node}!msg there is no guarantee that the process that should receive the message is located at node;
thus it may be necessary to relay the message to a process on yet another node

So in erlang situation is ok. However we have some solution that is in the middle between future and current

erlang at is not so obvoius, especially considering the fact that we have additional rules for serialization:

a. distributed-process _can_ store remote processes in a local registry.

register :: String -> ProcessId -> Process ()

Register a process with the local registry (asynchronous). This version will wait until a response
is gotten from the management process. The name must not already be registered. The process need
not be on this node. A bad registration will result in a ProcessRegistrationException

The process to be registered does not have to be local itself.

b. distributed-process have an additional method nsendRemote:
nsendRemote :: Serializable a => NodeId -> String -> a -> Process ()

that is smth about (Name, Node) ! Message, but not (Pid, Node) ! Message (do we miss it completely?,

is it not needed anymore).

c. nsend :: Serializable a => String -> a -> Process () (in unsafePrimitives)

Named send to a process in the local registry (asynchronous)

Message is not serialized (so we may assume it's only for local processes), but documentation

says nothing about semantics and it reality message just got dropped. (we have a badarg in erlang).

I appreciate that this is unsafe module and we may have non-obvious semantics but then it should

be clearly documented.


d. same story here:

-- | Named send to a process in the local registry (asynchronous).
-- This function makes /no/ attempt to serialize and (in the case when the
-- destination process resides on the same local node) therefore ensure that
-- the payload is fully evaluated before it is delivered.
unsafeNSend :: Serializable a => String -> a -> Process ()
unsafeNSend = Unsafe.nsend


Summing all this up. I think we should somehow update distributed-process,

we need to specify general strategy either we target uniform semantics or current erlang
implementation. If we are selecting former, then we need to update implementation to
support remote processes in registry and sending to them using same nsend API.

If we are selecting the latter we need to ban remote processes in local registry and update
documentation.

--
Alexander


On Mon, Jan 12, 2015 at 8:02 PM, Tim Watson <watson....@gmail.com> wrote:
Iirc this is part of the formal semantics so we can't just change it. Certainly erlang's process registry is node local and I believe this has to do with monitoring. If we allow a remote process to be registered then we have to monitor it to ensure cleanup which adds work to the cb and also creates a reliability issue in terms of relying on the behaviour of registered pids.

> On 12 Jan 2015, at 11:35, Facundo Domínguez <facundo...@gmail.com> wrote:
>
> Hello Tim,
>   We noticed a while ago that the node controller assumes that
> processes in the local registry are local. See for instance
> postMessage [1].
>
>   We are tempted to change this so processes in the local registry
> are allowed to be remote.
>
>   Do you have any remarks about why things are currently this way and
> whether our proposal is ok?
>
> Best,
> Facundo
>
> [1] https://github.com/haskell-distributed/distributed-process/blob/85068c20e9b4765aad6c276aac90194c60299152/src/Control/Distributed/Process/Node.hs#L1114


Vershilov, Alexander

unread,
Jan 13, 2015, 7:04:06 AM1/13/15
to Tim Watson, cloud-haskel...@googlegroups.com, distribut...@googlegroups.com, Facundo Domínguez, Mathieu Boespflug, Edsko de Vries, Duncan Coutts
Hi, Tim.

At first I want to mention my personal view, that may be different from the
company perspective.

I'm ok with having either of approaches but not the mixture. I.e. current
distributed-process implementation should be consistent. So either:

1. Disallow remote nodes from saving to registry. // mimic current Erlang with gproc

2. Allow remote processes from registry and allow to use nsend to send messages
to any remote or local processes. // mimic unified semantics

3. (worse for me but..) Allow remote processes in local registry, but hide nsend
from the safe API and document constraints in unsafe. // hybrid approach

As current one seems inconsistent. As we can save a remote process to
local registry but can't use `nsend` (basically we will see no error if that happens),
and all those functions are from  public safe API and this behaviour is not
documented anywhere.

Now about points you have riced.

When I've read Cloud Haskell semantics I had not found description of semantics
for `nsend`, am I missed it?

About monitoring, I personally think that it doesn't interfere with named-send
and ability to register processes in any way (unless out implementation guarantees
then ProcessId will be unregistered if it dies, but I hope it's not a case). And
in both Erlang and our semantics sending to a stored name should never fail and we
don't give any additional delivery guarantees.We can't give any reliable guarantees
about the processes on the remote node, and monitoring + reliable connections
are only approximation, and there may be a huge time gap between sending to
a remote remove process and monitor event about process death. So I see no
difference between sending to a remote node either by send or by nsend. Also
it doesn't interfere with reliability guarantees that Cloud Haskell gives but Erlang doesn't.

I'd like to have gproc outside of this conversation at least before we will clarify
general strategy.


> Edsko/Duncan, anything to add about the reasons why you originally disallowed
> registration of remote processes in the local registry? Personally I think it's a bad
> idea, but I'm always up for being told I'm wrong. :)

I see reasons why disallowing may be good, more over I mention this as a one
way to solve this situation. I also see reasons for allowing remote processes
to be added to registry and allowed in `nsend`, and see one use case [1] that can
be easily implemented with this design. But I don't see any reason for allowing
remote processes in local registry, but disallowing them in nsend.

It was my personal view, but I think Facundo or Mathieu can give more info
from their point of view and our project perspective.

[1] Having ability to save remote processes in local registry it's possible to
implement message forwarding that will be completely transparent for the
user. Say we want to introduce some process with role "Leader" and that
process may move from one process to another (ignoring failures for simplicity)
then each process can register a Leader and it can be any local or remote
process, send sending `nsend "Leader" Message` will run in the cluster
possibly in many hops and eventually will be delivered to Leader even if
it also moves withing a cluster. Real situation is more complicated, but
this may be a simple illustration for unified semantics (not unification of
local and remote processes and possibility to send messages in cluster
having a number of hops).

--
Alexander



Tim Watson

unread,
Jan 14, 2015, 9:01:22 AM1/14/15
to Vershilov, Alexander, cloud-haskel...@googlegroups.com, distribut...@googlegroups.com, Facundo Domínguez, Mathieu Boespflug, Edsko de Vries, Duncan Coutts
Hi Alexander,

On 13 January 2015 at 12:04, Vershilov, Alexander <alexander...@tweag.io> wrote:
I'm ok with having either of approaches but not the mixture. I.e. current
distributed-process implementation should be consistent. So either:

1. Disallow remote nodes from saving to registry. // mimic current Erlang with gproc

2. Allow remote processes from registry and allow to use nsend to send messages
to any remote or local processes. // mimic unified semantics

3. (worse for me but..) Allow remote processes in local registry, but hide nsend
from the safe API and document constraints in unsafe. // hybrid approach


I think (2) is probably (?) what we want but...
 
As current one seems inconsistent. As we can save a remote process to
local registry but can't use `nsend` (basically we will see no error if that happens),
and all those functions are from  public safe API and this behaviour is not
documented anywhere.

So the thing about the current implementation is that it allows you to register a remote process on the local node but not nsend to it. What the local NC *does* do however is notify any nodes on which a local process P is registered, that P has died (so it can be unregistered). This is an attempt at the "monitoring and cleanup" aspects I mentioned previously. I think it's flawed though, because there's no guarantee that we *know* all the other nodes on which P is registered, so I think we'd be better off not even trying to do this.
 

Now about points you have riced.

When I've read Cloud Haskell semantics I had not found description of semantics
for `nsend`, am I missed it?


My bad, it's not mentioned at all.
 
About monitoring, I personally think that it doesn't interfere with named-send
and ability to register processes in any way (unless out implementation guarantees
then ProcessId will be unregistered if it dies, but I hope it's not a case)


Whoa, on the contrary I hope that if a process dies *it is* guaranteed it will be unregistered! Why would you not want this to happen!?
 
And in both Erlang and our semantics sending to a stored name should never fail and we
don't give any additional delivery guarantees.

This bit I agree with.
 
We can't give any reliable guarantees about the processes on the remote node, and monitoring + reliable connections
are only approximation, and there may be a huge time gap between sending to
a remote remove process and monitor event about process death. So I see no
difference between sending to a remote node either by send or by nsend. Also
it doesn't interfere with reliability guarantees that Cloud Haskell gives but Erlang doesn't.


I half agree about when it comes to sending, but I do *not* agree that we can avoid cleaning up (i.e., unregistering) processes that die. And the other half about sending, is that you *only* know about nodes on which P is registered when you've been informed about them. So I'm not convinced there are any useful semantics to be had here. On what guarantees can you actually rely in this case? There are good reasons why erlang's process registry eschews remote registration!

 
I'd like to have gproc outside of this conversation at least before we will clarify
general strategy.


I'm happy to discuss that separately.
 
> Edsko/Duncan, anything to add about the reasons why you originally disallowed
> registration of remote processes in the local registry? Personally I think it's a bad
> idea, but I'm always up for being told I'm wrong. :)

I see reasons why disallowing may be good, more over I mention this as a one
way to solve this situation. I also see reasons for allowing remote processes
to be added to registry and allowed in `nsend`, and see one use case [1] that can
be easily implemented with this design. But I don't see any reason for allowing
remote processes in local registry, but disallowing them in nsend.


Yes sorry I misunderstood that part - I thought we'd done this on purpose but it's clearly just an oversight. You're right that we should aim for consistency.

 
It was my personal view, but I think Facundo or Mathieu can give more info
from their point of view and our project perspective.


I'd be interested to hear from the others too!
 
[1] Having ability to save remote processes in local registry it's possible to
implement message forwarding that will be completely transparent for the
user. Say we want to introduce some process with role "Leader" and that
process may move from one process to another (ignoring failures for simplicity)
then each process can register a Leader and it can be any local or remote
process, send sending `nsend "Leader" Message` will run in the cluster
possibly in many hops and eventually will be delivered to Leader even if
it also moves withing a cluster. Real situation is more complicated, but
this may be a simple illustration for unified semantics (not unification of
local and remote processes and possibility to send messages in cluster
having a number of hops).

That's a broken implementation of leader election unless the nodes are all coordinating with one another. You need real consensus for leader election and name registration doesn't solve that problem but instead, distributed name registration *relies* on consensus algorithms  (such as paxos and the like), which is why I'm loth to implement it at this (utility) library level. I *do* think Cloud Haskell users want the ability to do something like `sendToLeader message' but not via this set of APIs...

Thoughts people?

Tim Watson

unread,
Jan 14, 2015, 11:40:00 AM1/14/15
to cloud-haskel...@googlegroups.com, alexander...@tweag.io, distribut...@googlegroups.com, facundo...@gmail.com, mb...@tweag.net, ed...@well-typed.com, dun...@well-typed.com
Addendum..


On Wednesday, 14 January 2015 14:01:22 UTC, Tim Watson wrote:
I see reasons why disallowing may be good, more over I mention this as a one
way to solve this situation. I also see reasons for allowing remote processes
to be added to registry and allowed in `nsend`, and see one use case [1] that can
be easily implemented with this design. But I don't see any reason for allowing
remote processes in local registry, but disallowing them in nsend.


Yes sorry I misunderstood that part - I thought we'd done this on purpose but it's clearly just an oversight. You're right that we should aim for consistency.


Actually I think I'm wrong (maybe we're both wrong?) and this _was_ done on purpose. Both (re)register and nsend are local, and both offer remote variants in (reregisterRemoteAsync and nsendRemote. The point is that the caller decides whether they're dealing with a local or remote target, both in registering and communicating with the intended recipient. We might argue for location transparency here, but I don't believe what we have is actually inconsistent! And IMHO an argument for location transparency would need to be backed up by a use case that doesn't involve trying to shoehorn a complex distributed algorithm into the base library, since I personally feel such things are better presented as separate packages layered on top of distributed-process.

What are your collective thoughts about this folks???

Vershilov, Alexander

unread,
Jan 14, 2015, 3:36:14 PM1/14/15
to Tim Watson, cloud-haskel...@googlegroups.com, distribut...@googlegroups.com, Facundo Domínguez, Mathieu Boespflug, Edsko de Vries, Duncan Coutts

Hi, Tim.

I've sent  email to erlang-questions mailing list [1] , asking about the
reasons for the current behaviour and their thoughts on unified
semantics, I hope it will throw  some more light on the situation.

[1] https://groups.google.com/forum/m/#!topic/erlang-programming/xBBrFBLAANk

Tim Watson

unread,
Jan 20, 2015, 3:04:30 PM1/20/15
to Duncan Coutts, Vershilov, Alexander, cloud-haskel...@googlegroups.com, distribut...@googlegroups.com, Facundo Domínguez, Mathieu Boespflug, Edsko de Vries
Hi Duncan,

Thanks for getting back to us!

On 20 January 2015 at 10:51, Duncan Coutts <dun...@well-typed.com> wrote:
Thanks Tim, sorry for the slow reply.


Sure no probs.
 
So my initial thought is certainly that in general we stuck to the
Erlang's unified semantics and only deviated on any issue after careful
thought. I don't recall giving much thought to the process registry.

So it's good that people are considering it carefully now, but
personally I'd consider it by starting from the unified semantics and
asking if there's any compelling reason to deviate (rather than the
other way around).


I tend to agree with that in principle.
 
> Now, whilst I do not object to changing distributed-process to support this
> in principle, it concerns me that we will try to then build reliable
> distributed algorithms based upon it, and the semantics for it are
> undefined. It also introduces the need to monitor processes on remote nodes
> whenever they're registered, or risk the registry filling up with garbage
> pids that are no longer valid on their home node. That seems like a big
> ugly mess to me.

Yep, semantics are important.

So I've re-read the unified semantics paper and indeed it is very clear
that you can register remote pids in the local registry (and local pids
in a remote registry) and that nsend works as you'd expect. That is you
can nsend using the local registry to send to a local or remote process
(depending on what is registered locall), or you can nsend using a
remote registry in which case the nsend goes to that node, the lookup is
done there and again the message can be sent to a local or remote
process. (Note that this means that it's possible to nsend to one node,
and have the message ultimately delivered to another node).

Yes of course it is possible that a remote process may have gone away.

We would need to make some changes to the node controller to support this. Not many, but some.
 

In my opinion, what is a bit odd about the process registry is not
registering remote processes, but the existence of nsend in the first
place. I would have expected the primitives to be simply register and
lookup, with named send being a derived operation rather than a
primitive. The difference of course is that with nsend to a remote node
(for delivery on that node), only one message need be sent, rather than
one for lookup and a follow-up with the main message.

But here's what's odd: we have a named send, but none of the
corresponding linking and monitoring. I can't actually see any sensible
way of using the nsend primitive for the case of sending to a remote
node for local delivery on that node. For reliability we must be able to
link or monitor, but nsend does not help us with that. But to link or
monitor we have to do a lookup to get the actual pid, so we can use the
link/monitor primitives. But if we look up the pid, then there's no need
for nsend, as we've lost its benefit. We would need something like
nlink, nmonitor. And then there's the thorny issue of the registry
changing between calls.

So, if one got rid of nsend and one always had to do lookups anyway,
then it makes perfect sense to allow registering remote pids in a local
registry (and local in a remote).

How do erlang programmers use nsend in the first place? Is it only used
for "fire and forget" in cases where reliability is not needed but the
latency of an extra round trip is critical? Seems very niche.


So there's quite a lot to consider here. Firstly you're right about the lack of nlink and nmonitor - those are conspicuous by their absence in Erlang and the reasons is tied up with the process registry's refusal to name a non-local process. Erlang's process registry *does* track process deaths and therefore when a process dies its name is removed from the process registry automatically. Sending to a dead pid in erlang never fails, but sending a named process that is not locally registered is an error and will fail!

<terminal>
8> P = self().
<0.48.0>
9> exit("foo").
** exception exit: "foo"
10> P ! bar.
bar
11> flush().
ok
12> P.
<0.48.0>
13> is_process_alive(P).
false
14> register(foo, self()).
true
15> foo ! bar.
bar
16> flush().
Shell got bar
ok
17> exit("bar").
** exception exit: "bar"
18> foo ! bar.  
** exception error: bad argument
     in operator  !/2
        called as foo ! bar
</terminal> 

That's how the current beam implementation works, not necessarily how the unified semantics say things should work. I need to re-read the unified semantics to figure out how sending a named process that has died *should* work. But needless to say, we can't just ignore garbage pids in the process registry since doing so is effectively a memory leak.

As for monitoring/linking, I agree this is something that's missing. We /could/ change nsend so that it works transparently, but the monitoring/linking issue would need to be dealt with by setting up monitors for all registered remote pids and clearing pids from the registry each time they die. That's quite a lot of overhead!

An alternative which I prefer, would be your suggestion of implementing nsend in terms of whereis and send instead. The problem with that however is that there is an obvious race in it, so it's unlikely to be used much. Another alternative that IMHO works well is to take the approach beam uses and refuse to register remote pids, since there are no monitoring overheads for unregistered local processes that die.

People that want remote registered pids and named send + monitoring of named pids can use my distributed-process-registry package, which provides exactly those features. It does of course incur all the obvious monitoring overheads, but use of it is then the choice of the programmer rather than something we impose on all users of the base distributed-process library.

Those are my two pennies. Happy to continue this conversation though folks - I like that we're bike-shedding this to come up with a solution that works for everyone. Be interested to hear from users in high throughput environments that might potentially care about lots of additional remote monitors. Folks at tweag or anyone else??

Cheers,
Tim 

Tim Watson

unread,
Jan 21, 2015, 8:05:32 AM1/21/15
to Duncan Coutts, Vershilov, Alexander, cloud-haskel...@googlegroups.com, distribut...@googlegroups.com, Facundo Domínguez, Mathieu Boespflug, Edsko de Vries
On 21 January 2015 at 11:07, Duncan Coutts <dun...@well-typed.com> wrote:
On Tue, 2015-01-20 at 20:04 +0000, Tim Watson wrote:

My reading of the unified semantics is that if you nsend to a process
that is not registered then nothing happens, it's just discarded. It is
also clear that the node controller removes the pid from the registry
when it is notified that it has died. It looks like it will do that for
local processes and for remote pids in the local registry when it
receives a link/monitor notification about that pid. So that'd indicate
that local processes are removed promptly, and remote ones would only be
removed if/when the nc gets a notification about them.


Agreed.
 
However, in the unified semantics the node controller is only notified
eventually (asynchronously) of a local process exiting. So it's still
possible in that semantics to nsend to a local process that is still in
the registry but has exited.


That is, I think possibly different to the beam semantics, though I'm not 100% sure.
 
So you say that erlang guarantees that it's an error to send to a local
registered process that has died, where as in the unifies semantics
that's a no-op (it's either discarded when the registry lookup fails, or
when delivering to the now-dead process). But erlang cannot give that
guarantee for a send to a remote node with a locally registered process
on that node. Presumably in that case erlang also just silently discards
the message? (And given that one cannot link or monitor, how is it
used?)


Yes I believe that if you evaluate `{nodename, registeredname} ! Message` in Erlang and the remote name isn't registered it's silently discarded. Let me double check when I'm back in front of a proper operating system (I'm on windows right now and don't have a proper development environment set up)...
 
> As for monitoring/linking, I agree this is something that's missing. We
> /could/ change nsend so that it works transparently, but the
> monitoring/linking issue would need to be dealt with by setting up monitors
> for all registered remote pids and clearing pids from the registry each
> time they die. That's quite a lot of overhead!

Not just overhead but too much policy for this level I'd say.


Well I do agree on both counts, but I think that tracking remote pids /without/ monitoring is bad because we can end up with a lot of garbage in the local registry. So if we continue to allow remote pids to be registered, we should be setting up monitors for them I think, which afaict from looking at the code we are currently not.
 
> An alternative which I prefer, would be your suggestion of implementing
> nsend in terms of whereis and send instead. The problem with that however
> is that there is an obvious race in it, so it's unlikely to be used much.
> Another alternative that IMHO works well is to take the approach beam uses
> and refuse to register remote pids, since there are no monitoring overheads
> for unregistered local processes that die.

It's not clear to me that there is a race, we take a snapshot of the
registry and send to the process that was registered at that moment
(which may no longer be the processes registered, or it may no longer be
alive).

Given that we can do registry lookups, that's always an issue.


Yeah that's actually a fair point.
 
I honestly still don't see how anyone can use nsend or its erlang
equivalent in a sane way. Given that you cannot use linking or
monitoring with it the only way to use it seems to be to send
self-contained messages and where the sender waits for a reply with a
timeout (since no link or monitor).


Well it's fine for fire-and-forget messaging.
 
So an issue with registering remote pids is knowing when to eventually
remove them. The unified semantics indicate that the nc removes them
when it gets a link/monitor message, but it's not guaranteed to get one
of those for a remote process.


Exactly. I think that's a bug, in so much as if we *do* continue to support remote pids then we should set up monitors for them. I don't get why that guarantee isn't present in the unified semantics, doesn't make sense without it. I suppose the lack of a guarantee is due to the fact that we don't implement heartbeats between nodes. That would ensure that monitoring a remote process would /eventually/ result in a signal if the entire node went away.
 
> People that want remote registered pids and named send + monitoring of
> named pids can use my distributed-process-registry package, which provides
> exactly those features. It does of course incur all the obvious monitoring
> overheads, but use of it is then the choice of the programmer rather than
> something we impose on all users of the base distributed-process library.

Well, note that we'd only be paying for what we ask for. There's no
extra overhead if you stick to registering only local processes.


That's a good point.

If we re-implemented nsend in terms of `whereis` and `send`, we would get the behaviour that the unified semantics calls for, viz if the process had died the send would just silently fail. That might be a good thing, but we still have the issue that we will effectively leak memory if lots of remote processes are registered but not monitored. Even monitoring them isn't guaranteed to generate a signal either, given the lack of heartbeats.

I'm still not sure what the right thing to do is. Does someone want to summarise the options or shall I do so? I'll do it later this afternoon (UK time) if nobody has more to add at this stage.

Cheers,
Tim

Vershilov, Alexander

unread,
Jan 21, 2015, 1:54:09 PM1/21/15
to Tim Watson, Duncan Coutts, cloud-haskel...@googlegroups.com, distribut...@googlegroups.com, Facundo Domínguez, Mathieu Boespflug, Edsko de Vries
I'll try to summarise it.

1. Registry.
1.1 It should be possible to register remote processes in local
registry (and local processes in remote registry)
1.2 If a remote process is stored in registry then Node Controller
starts monitoring that process
1.3 If a process dies and we get asynchronous notification then we
remove process from registry

2. nsend
2.1 nsend should be implemented in terms of lookup (`whereis`) and send
2.2 nsend don't rise any exceptions regardless if process stored in
registry or not

Together this means that message send using nsend or remoteNSend is
not guaranteed to
be delivered, and we need to call restart if we got a connection
failure (as it's a property of send).

Also I a required properties of monitoring are an open question

1. should monitoring a registry entry automatically unmonitor old and
monitor new process then registry entry
is updated?
2. should we create a RegistryChanged monitor event?
3. if monitoring process should be updated on change then could we
have a races between
event's:
a. process that was registered dies (so we should see a ProcessDied event)
b. registry change (if this event occur before a then we should not
see ProcessDied event)

Do we have some sort of a way to create online document, or document
in repository, so we could
bump all required semantics that everybody agree on there?

Tim Watson

unread,
Jan 21, 2015, 4:04:25 PM1/21/15
to cloud-haskel...@googlegroups.com, watson....@gmail.com, dun...@well-typed.com, distribut...@googlegroups.com, facundo...@gmail.com, mb...@tweag.net, ed...@well-typed.com
Thanks Alexander,

Some thoughts inline below..


On Wednesday, 21 January 2015 20:52:28 UTC, Alexander Vershilov wrote:
I'll try to summarise it.

1. Registry.
1.1 It should be possible to register remote processes in local
registry (and local processes in remote registry)
1.2 If a remote process is stored in registry then Node Controller
starts monitoring that process
1.3 If a process dies and we get asynchronous notification then we
remove process from registry


Are we all agreed on this then? I certainly get the impression that some users would prefer to ban remote process registrations as the beam implementation of erlang does instead. Whilst I agree with Duncan that it's a good idea to stick to the unified semantics as closely as possible, I must admit it would be /easier/ if we banned remote pids from the registry. We do need to decide on this either way.
 
2. nsend
2.1 nsend should be implemented in terms of lookup (`whereis`) and send
2.2 nsend don't rise any exceptions regardless if process stored in
registry or not


These changes are fairly insignificant so that seems fine to me. In what way shall we then redefine nsendRemote once we do this? Thoughts?
 
Together this means that message send using nsend or remoteNSend is
not guaranteed to
be delivered, and we need to call restart if we got a connection
failure (as it's a property of send).


All forms of sending in Cloud Haskell are fire and forget and minus any delivery guarantees, so I have no problem with that. Why do we need to call restart?
 
Also I a required properties of monitoring are an open question


I don't think we should support monitoring of registry entries. That's a whole other can of worms. Erlang doesn't support this at all, but gproc does. Likewise I don't think that level of detail should be present in distributed-process, but you can do *all* of those things using distributed-process-registry, which is a fairly simple package.
 
1. should monitoring a registry entry automatically unmonitor old and
monitor new process then registry entry
is updated?
2. should we create a RegistryChanged monitor event?

These and many more features exist in distributed-process-registry already.
 
3. if monitoring process should be updated on change then could we
have a races between
event's:
a. process that was registered dies (so we should see a ProcessDied event)
b. registry change (if this event occur before a then we should not
see ProcessDied event)

That's an interesting race to consider. Can you take a look at https://github.com/haskell-distributed/distributed-process-registry/blob/master/src/Control/Distributed/Process/Registry.hs and see if that's actually possible? You'll probably need to reference at least the documentation in https://github.com/haskell-distributed/distributed-process-client-server/blob/master/src/Control/Distributed/Process/ManagedProcess.hs in order to ascertain the server's behaviour. I do not think that we will see this race at all.
 

Do we have some sort of a way to create online document, or document
in repository, so we could
bump all required semantics that everybody agree on there?


Can you add a new documentation section (or page) to https://github.com/haskell-distributed/haskell-distributed.github.com so we can maintain a view for future reference on the website please? That way people can make pull requests and once we're happy with the spec we can take PRs to implement the required changes.

Cheers,
Tim

Alexander V Vershilov

unread,
Jan 21, 2015, 4:36:12 PM1/21/15
to Tim Watson, cloud-haskel...@googlegroups.com, Duncan Coutts, distribut...@googlegroups.com, Facundo Domínguez, Mathieu Boespflug, Edsko de Vries
On 22 January 2015 at 00:04, Tim Watson <watson....@gmail.com> wrote:
> Thanks Alexander,
>
> Some thoughts inline below..
>
> On Wednesday, 21 January 2015 20:52:28 UTC, Alexander Vershilov wrote:
>>
>> I'll try to summarise it.
>>
>> 1. Registry.
>> 1.1 It should be possible to register remote processes in local
>> registry (and local processes in remote registry)
>> 1.2 If a remote process is stored in registry then Node Controller
>> starts monitoring that process
>> 1.3 If a process dies and we get asynchronous notification then we
>> remove process from registry
>>
>
> Are we all agreed on this then? I certainly get the impression that some
> users would prefer to ban remote process registrations as the beam
> implementation of erlang does instead. Whilst I agree with Duncan that it's
> a good idea to stick to the unified semantics as closely as possible, I must
> admit it would be /easier/ if we banned remote pids from the registry. We do
> need to decide on this either way.

I totally agree with both you and Duncan that it's a good idea to
stick to unified
semantics. So quoting it:


Definition 18 Let the function deleteDead(pid,nc) be defined in the obvious way;
deleting all occurrences of pid from the node controller structure nc.
In the case when the
pid represents a node (i.e. it is in fact a nid), the nc should be
cleared of all processes
at that node as well.

And effect for event about remote process death (Table 12.2):
deleteDead(nid;<links;mns;reg>)

So according to semantics we should delete process from registry.
Or maybe I misunderstood something?


>>
>> 2. nsend
>> 2.1 nsend should be implemented in terms of lookup (`whereis`) and send
>> 2.2 nsend don't rise any exceptions regardless if process stored in
>> registry or not
>>
>
> These changes are fairly insignificant so that seems fine to me. In what way
> shall we then redefine nsendRemote once we do this? Thoughts?

Yes. And about 2.2 I hope that I correctly understood Table 14.2, it tells about
whereis semantics.

Also I think that in unsafe modules we can keep current
implementation that will
not serialize messages but will not work for remote processes.

I don't see how nsendRemote will be different, it's just a same
`lookup+send` but
called on another host, isn't it? so it will work as "send control
message to another
host" and another host does nsend (lookup+send).

>>
>> Together this means that message send using nsend or remoteNSend is
>> not guaranteed to
>> be delivered, and we need to call restart if we got a connection
>> failure (as it's a property of send).
>>
>
> All forms of sending in Cloud Haskell are fire and forget and minus any
> delivery guarantees, so I have no problem with that. Why do we need to call
> restart?

Do I undestand correct that with reliability guarantees if smth will happen
with connection to target process then we need to call restart in order to
send new messages to that process? With nsend we will have smth similar
but now process that is a target can change during the program lifetime.
I can't tell if we will have problems there but this is place we need to check
twice, especially because it's not covered in unified semantics at all.


>>
>> Also I a required properties of monitoring are an open question
>>
>
> I don't think we should support monitoring of registry entries. That's a
> whole other can of worms. Erlang doesn't support this at all, but gproc
> does. Likewise I don't think that level of detail should be present in
> distributed-process, but you can do *all* of those things using
> distributed-process-registry, which is a fairly simple package.

Fair enough.

>>
>> 1. should monitoring a registry entry automatically unmonitor old and
>> monitor new process then registry entry
>> is updated?
>> 2. should we create a RegistryChanged monitor event?
>
>
> These and many more features exist in distributed-process-registry already.
>
>>
>> 3. if monitoring process should be updated on change then could we
>> have a races between
>> event's:
>> a. process that was registered dies (so we should see a ProcessDied event)
>> b. registry change (if this event occur before a then we should not
>> see ProcessDied event)
>
>
> That's an interesting race to consider. Can you take a look at
> https://github.com/haskell-distributed/distributed-process-registry/blob/master/src/Control/Distributed/Process/Registry.hs
> and see if that's actually possible? You'll probably need to reference at
> least the documentation in
> https://github.com/haskell-distributed/distributed-process-client-server/blob/master/src/Control/Distributed/Process/ManagedProcess.hs
> in order to ascertain the server's behaviour. I do not think that we will
> see this race at all.

I'll check the code tomorrow and will be able to reply thoroughly. But now
I can have a quick reply that may be not totally correct. I think that with
remote processes such race is possible because ProcessDied and
Label change event can be induced from the different hosts, so what will
happen will depend on the message order. If we don't need such strong
guarantees, then it's ok, but it should be clear, I think for most of
the programs
it doesn't matter anyway.

>>
>> Do we have some sort of a way to create online document, or document
>> in repository, so we could
>> bump all required semantics that everybody agree on there?
>>
>
> Can you add a new documentation section (or page) to
> https://github.com/haskell-distributed/haskell-distributed.github.com so we
> can maintain a view for future reference on the website please? That way
> people can make pull requests and once we're happy with the spec we can take
> PRs to implement the required changes.

OK I'll be able to do it tomorrow.

>
> Cheers,
> Tim
>
> --
> You received this message because you are subscribed to the Google Groups
> "cloud-haskell-developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cloud-haskell-deve...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
Alexander

Simon Peyton Jones

unread,
Jan 21, 2015, 4:54:29 PM1/21/15
to Vershilov, Alexander, Tim Watson, Duncan Coutts, cloud-haskel...@googlegroups.com, distribut...@googlegroups.com, Facundo Domínguez, Mathieu Boespflug, Edsko de Vries, Simon Peyton Jones
Friends

I'm not following all the details of this conversation, but I do think it would be good for Cloud Haskell to have a documented, well-specified semantics covering points like this.

The email thread refers to "the semantic notes" (where?) and the "unified semantics" (paper I think). The (generally excellent) Cloud Haskell web site http://haskell-distributed.github.io/ has no link to semantics under "Documentation", but under "Resources" we find a link to "Formal semantics" which gives a "DRAFT" semantics from Well Typed dated 2012.

Finally Alexander below says
| Do we have some sort of a way to create online document, or document
| in repository, so we could
| bump all required semantics that everybody agree on there?

I think he's right. It would be great to have a "live", shared, community document that says what the semantics is supposed to be, and is kept up to date. (Include pointers to its honourable predecessors!)

Simon
| --
| You received this message because you are subscribed to the Google Groups
| "Distributed Haskell" group.
| To unsubscribe from this group and stop receiving emails from it, send an
| email to distributed-has...@googlegroups.com.
| To post to this group, send an email to distributed-
| has...@googlegroups.com.
| To view this discussion on the web, visit
| https://groups.google.com/d/msgid/distributed-
| haskell/CA%2BNT4_xBb0YD8Y4tpph_jDe2kcOqP1ft_bnmK7EwJ%3DqY4JymJQ%40mail.gm
| ail.com.

Alexander V Vershilov

unread,
Jan 21, 2015, 5:23:40 PM1/21/15
to Simon Peyton Jones, Vershilov, Alexander, Tim Watson, Duncan Coutts, cloud-haskel...@googlegroups.com, distribut...@googlegroups.com, Facundo Domínguez, Mathieu Boespflug, Edsko de Vries
Hi, Simon.

On 22 January 2015 at 00:54, Simon Peyton Jones <sim...@microsoft.com> wrote:
> Friends
>
> I'm not following all the details of this conversation, but I do think it would be good for Cloud Haskell to have a documented, well-specified semantics covering points like this.
>
> The email thread refers to "the semantic notes" (where?) and the "unified semantics" (paper I think). The (generally excellent) Cloud Haskell web site http://haskell-distributed.github.io/ has no link to semantics under "Documentation", but under "Resources" we find a link to "Formal semantics" which gives a "DRAFT" semantics from Well Typed dated 2012.

Basically currently we have following documents:

1. Original article
[http://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/remote.pdf]

@article{Epstein:2011:THC:2096148.2034690,
author = {Epstein, Jeff and Black, Andrew P. and Peyton-Jones, Simon},
title = {Towards Haskell in the Cloud},
journal = {SIGPLAN Not.},
issue_date = {December 2011},
volume = {46},
number = {12},
month = sep,
year = {2011},
issn = {0362-1340},
pages = {118--129},
numpages = {12},
url = {http://doi.acm.org/10.1145/2096148.2034690},
doi = {10.1145/2096148.2034690},
acmid = {2034690},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {erlang, haskell, message-passing},
}

it was not referred in this thread

2. Cloud haskell semantics
[http://haskell-distributed.github.io/static/semantics.pdf]

This is a draft by Well Typed, it was referred as Cloud Haskell
semantics in this thread

I hope that sources of this draft will be saved in repository once we
will have them, or
recreated. Currently this paper is the actual specification for Cloud
Haskell, however
it's does not cover everything, and it's main purposes (as far as I
understand) is to
document differences with unified semantics for the future erlang (3).
So it's assumed
that if smth is not specified there then it should work as in unified
semantics (if specified),
unless there is a significance problems with that approach. And
basically at this moment
we spot such problem, so all this thread is about understanding
current thread and
specifying the semantics (that will be added to this document) based
on current erlang
semantics, unified semantics and common sense.

3. Unified semantics for Future Erlang
[http://happy-testing.com/hans/papers/EW2010-UnifiedSemantics.pdf]

@inproceedings{Svensson:2010:USF:1863509.1863514,
author = {Svensson, Hans and Fredlund, Lars-\AAke and Benac Earle, Clara},
title = {A Unified Semantics for Future Erlang},
booktitle = {Proceedings of the 9th ACM SIGPLAN Workshop on Erlang},
series = {Erlang '10},
year = {2010},
isbn = {978-1-4503-0253-1},
location = {Baltimore, Maryland, USA},
pages = {23--32},
numpages = {10},
url = {http://doi.acm.org/10.1145/1863509.1863514},
doi = {10.1145/1863509.1863514},
acmid = {1863514},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {erlang, semantics},
}

It was referred as unified (sometimes as "uniform" by my mistake) semantics, and
there was a link to that paper in the beginning of this thread.

4. sometimes we were referring to current beam implementation, but
there are no full article about it, and it was just pointers to the actual
documentation.

I think it's a mistake that our main site miss those links and
description how article relates to the actual implementation.
I'm going to make Pull Request that will add those tomorrow.


>
> Finally Alexander below says
> | Do we have some sort of a way to create online document, or document
> | in repository, so we could
> | bump all required semantics that everybody agree on there?
>
> I think he's right. It would be great to have a "live", shared, community document that says what the semantics is supposed to be, and is kept up to date. (Include pointers to its honourable predecessors!)

You are right.
> You received this message because you are subscribed to the Google Groups "cloud-haskell-developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to cloud-haskell-deve...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
Alexander

Alexander Vershilov

unread,
Jan 22, 2015, 1:54:52 PM1/22/15
to cloud-haskel...@googlegroups.com, alexander...@tweag.io, distribut...@googlegroups.com, facundo...@gmail.com, mb...@tweag.net, ed...@well-typed.com, dun...@well-typed.com
Hi.

After this discussion I've pushed (and Tim applied following changes to site):

1. Link to unified semantics for future erlang paper in Resources.
But I need that we have to structure somehow relevant articles in 'Documentation'
section (but it's an offtopic in this thread).

2. Development wiki page, I think it may be used as a sandbox for discussing
changes to semantics:
  http://haskell-distributed.github.io/wiki/development.html

I also bumped there what we have desided (?) about semantics. Monitoring
of the registry is uncovered for now.

I think that if no-one will have an objections for this plan than I can start
working on in on the next week. Current proposal for new rules (it's not change
as these topics were not specified in semantics) will solve the problem we
have started with. But it's still not a full specification for Registry. Some some
work is still required.

--
Alexnder

Tim Watson

unread,
Jan 23, 2015, 9:38:49 AM1/23/15
to Alexander Vershilov, cloud-haskel...@googlegroups.com, Alexander Vershilov, distribut...@googlegroups.com, Facundo Domínguez, Mathieu Boespflug, Edsko de Vries, Duncan Coutts
Thanks Alexander,

On 22 January 2015 at 18:54, Alexander Vershilov <alexander...@gmail.com> wrote:
After this discussion I've pushed (and Tim applied following changes to site):


Cheers for doing that!
 
2. Development wiki page, I think it may be used as a sandbox for discussing
changes to semantics:
  http://haskell-distributed.github.io/wiki/development.html


This is definitely a good idea.
 
I also bumped there what we have desided (?) about semantics. Monitoring
of the registry is uncovered for now.


I'll take a look at that now.
 
I think that if no-one will have an objections for this plan than I can start
working on in on the next week. Current proposal for new rules (it's not change
as these topics were not specified in semantics) will solve the problem we
have started with. But it's still not a full specification for Registry. Some some
work is still required.


Yeah let's just make sure everyone is comfortable with (or willing to compromise on) what's being proposed. Thanks for volunteering to work on this next week - really appreciate that! I think that's good timing too, since it gives people the weekend to discuss and further refine any changes if necessary. It also gives me time to get various other outstanding pull requests merged. :)

I'm now going to read through the proposals on the website and see if *I* actually agree with them to. I'll report back. ;)

Cheers all,
Tim 

Reply all
Reply to author
Forward
0 new messages