Hi, Tim.
I have spent some time on trying to understand Erlang situation.
And have found following things:
1. Register: http://www.erlang.org/doc/man/erlang.html#register-2
Failure: badarg if PidOrPort is not an existing, local process or port, if RegName is already in use,
if the process or port is already registered (already has a name), or if RegName is the atom undefined.It tells that it's not possible to register a Pid or Port in local registry if process is remote.
2. Send: http://erlang.org/doc/reference_manual/expressions.html#id78439
Sends the value of Expr2 as a message to the process specified by Expr1. The value of Expr2
is also the return value of the expression.Expr1 must evaluate to a pid, a registered name (atom) or a tuple {Name,Node},
where Name is an atom and Node a node name, also an atom.
- If Expr1 evaluates to a name, but this name is not registered, a badarg run-time error will occur.
- Sending a message to a pid never fails, even if the pid identifies a non-existing process.
- Distributed message sending, that is if Expr1 evaluates to a tuple {Name,Node} (or a pid located at another node), also never fails.
3. Erlang misses a primitive for sending a message by the registry name to the remote process.
4. Most OTP services accept 3 forms on the destination:
* Pid
* (local, LocalName::atom())
* (global, GlobalName::atom())
5. in order to have global registry erlang people use gproc.
6. In future uniform erlang semantics remote processes are allowed in local registry and send, as were
pointed out by Facundo.
However, for uniformity, in this semantics names can be registered for remote processes (i.e., register(name,pid)
does not fail if pid is a remote process), and registering a local process at a remote node is supported too (using
the operation register(node,name,pid)). As a consequence, when a message is sent to a remote node using the syntax
{atom,node}!msg there is no guarantee that the process that should receive the message is located at node;
thus it may be necessary to relay the message to a process on yet another nodeSo in erlang situation is ok. However we have some solution that is in the middle between future and current
erlang at is not so obvoius, especially considering the fact that we have additional rules for serialization:
a. distributed-process _can_ store remote processes in a local registry.
register :: String -> ProcessId -> Process ()
Register a process with the local registry (asynchronous). This version will wait until a response
is gotten from the management process. The name must not already be registered. The process need
not be on this node. A bad registration will result in aProcessRegistrationException
The process to be registered does not have to be local itself.
b. distributed-process have an additional method nsendRemote:
nsendRemote :: Serializable a => NodeId -> String -> a -> Process ()that is smth about (Name, Node) ! Message, but not (Pid, Node) ! Message (do we miss it completely?,
is it not needed anymore).
c. nsend :: Serializable a => String -> a -> Process () (in unsafePrimitives)
Named send to a process in the local registry (asynchronous)
Message is not serialized (so we may assume it's only for local processes), but documentation
says nothing about semantics and it reality message just got dropped. (we have a badarg in erlang).
I appreciate that this is unsafe module and we may have non-obvious semantics but then it should
be clearly documented.
d. same story here:
-- | Named send to a process in the local registry (asynchronous).
-- This function makes /no/ attempt to serialize and (in the case when the
-- destination process resides on the same local node) therefore ensure that
-- the payload is fully evaluated before it is delivered.
unsafeNSend :: Serializable a => String -> a -> Process ()
unsafeNSend = Unsafe.nsendSumming all this up. I think we should somehow update distributed-process,
we need to specify general strategy either we target uniform semantics or current erlang
implementation. If we are selecting former, then we need to update implementation to
support remote processes in registry and sending to them using same nsend API.If we are selecting the latter we need to ban remote processes in local registry and update
documentation.--
AlexanderOn Mon, Jan 12, 2015 at 8:02 PM, Tim Watson <watson....@gmail.com> wrote:Iirc this is part of the formal semantics so we can't just change it. Certainly erlang's process registry is node local and I believe this has to do with monitoring. If we allow a remote process to be registered then we have to monitor it to ensure cleanup which adds work to the cb and also creates a reliability issue in terms of relying on the behaviour of registered pids.
> On 12 Jan 2015, at 11:35, Facundo Domínguez <facundo...@gmail.com> wrote:
>
> Hello Tim,
> We noticed a while ago that the node controller assumes that
> processes in the local registry are local. See for instance
> postMessage [1].
>
> We are tempted to change this so processes in the local registry
> are allowed to be remote.
>
> Do you have any remarks about why things are currently this way and
> whether our proposal is ok?
>
> Best,
> Facundo
>
> [1] https://github.com/haskell-distributed/distributed-process/blob/85068c20e9b4765aad6c276aac90194c60299152/src/Control/Distributed/Process/Node.hs#L1114
3. (worse for me but..) Allow remote processes in local registry, but hide nsendto any remote or local processes. // mimic unified semantics2. Allow remote processes from registry and allow to use nsend to send messagesI'm ok with having either of approaches but not the mixture. I.e. current1. Disallow remote nodes from saving to registry. // mimic current Erlang with gproc
distributed-process implementation should be consistent. So either:
from the safe API and document constraints in unsafe. // hybrid approach
As current one seems inconsistent. As we can save a remote process tolocal registry but can't use `nsend` (basically we will see no error if that happens),
and all those functions are from public safe API and this behaviour is not
documented anywhere.
Now about points you have riced.When I've read Cloud Haskell semantics I had not found description of semanticsfor `nsend`, am I missed it?
About monitoring, I personally think that it doesn't interfere with named-send
and ability to register processes in any way (unless out implementation guaranteesthen ProcessId will be unregistered if it dies, but I hope it's not a case)
And in both Erlang and our semantics sending to a stored name should never fail and wedon't give any additional delivery guarantees.
We can't give any reliable guarantees about the processes on the remote node, and monitoring + reliable connections
are only approximation, and there may be a huge time gap between sending toa remote remove process and monitor event about process death. So I see nodifference between sending to a remote node either by send or by nsend. Also
it doesn't interfere with reliability guarantees that Cloud Haskell gives but Erlang doesn't.
I'd like to have gproc outside of this conversation at least before we will clarifygeneral strategy.
> Edsko/Duncan, anything to add about the reasons why you originally disallowed
> registration of remote processes in the local registry? Personally I think it's a bad
> idea, but I'm always up for being told I'm wrong. :)I see reasons why disallowing may be good, more over I mention this as a oneway to solve this situation. I also see reasons for allowing remote processesto be added to registry and allowed in `nsend`, and see one use case [1] that can
be easily implemented with this design. But I don't see any reason for allowingremote processes in local registry, but disallowing them in nsend.
It was my personal view, but I think Facundo or Mathieu can give more infofrom their point of view and our project perspective.
[1] Having ability to save remote processes in local registry it's possible to
implement message forwarding that will be completely transparent for theuser. Say we want to introduce some process with role "Leader" and thatprocess may move from one process to another (ignoring failures for simplicity)then each process can register a Leader and it can be any local or remoteprocess, send sending `nsend "Leader" Message` will run in the clusterpossibly in many hops and eventually will be delivered to Leader even ifit also moves withing a cluster. Real situation is more complicated, but
this may be a simple illustration for unified semantics (not unification of
local and remote processes and possibility to send messages in clusterhaving a number of hops).
I see reasons why disallowing may be good, more over I mention this as a oneway to solve this situation. I also see reasons for allowing remote processesto be added to registry and allowed in `nsend`, and see one use case [1] that can
be easily implemented with this design. But I don't see any reason for allowingremote processes in local registry, but disallowing them in nsend.Yes sorry I misunderstood that part - I thought we'd done this on purpose but it's clearly just an oversight. You're right that we should aim for consistency.
Hi, Tim.
I've sent email to erlang-questions mailing list [1] , asking about the
reasons for the current behaviour and their thoughts on unified
semantics, I hope it will throw some more light on the situation.
[1] https://groups.google.com/forum/m/#!topic/erlang-programming/xBBrFBLAANk
Thanks Tim, sorry for the slow reply.
So my initial thought is certainly that in general we stuck to the
Erlang's unified semantics and only deviated on any issue after careful
thought. I don't recall giving much thought to the process registry.
So it's good that people are considering it carefully now, but
personally I'd consider it by starting from the unified semantics and
asking if there's any compelling reason to deviate (rather than the
other way around).
> Now, whilst I do not object to changing distributed-process to support this
> in principle, it concerns me that we will try to then build reliable
> distributed algorithms based upon it, and the semantics for it are
> undefined. It also introduces the need to monitor processes on remote nodes
> whenever they're registered, or risk the registry filling up with garbage
> pids that are no longer valid on their home node. That seems like a big
> ugly mess to me.
Yep, semantics are important.
So I've re-read the unified semantics paper and indeed it is very clear
that you can register remote pids in the local registry (and local pids
in a remote registry) and that nsend works as you'd expect. That is you
can nsend using the local registry to send to a local or remote process
(depending on what is registered locall), or you can nsend using a
remote registry in which case the nsend goes to that node, the lookup is
done there and again the message can be sent to a local or remote
process. (Note that this means that it's possible to nsend to one node,
and have the message ultimately delivered to another node).
Yes of course it is possible that a remote process may have gone away.
In my opinion, what is a bit odd about the process registry is not
registering remote processes, but the existence of nsend in the first
place. I would have expected the primitives to be simply register and
lookup, with named send being a derived operation rather than a
primitive. The difference of course is that with nsend to a remote node
(for delivery on that node), only one message need be sent, rather than
one for lookup and a follow-up with the main message.
But here's what's odd: we have a named send, but none of the
corresponding linking and monitoring. I can't actually see any sensible
way of using the nsend primitive for the case of sending to a remote
node for local delivery on that node. For reliability we must be able to
link or monitor, but nsend does not help us with that. But to link or
monitor we have to do a lookup to get the actual pid, so we can use the
link/monitor primitives. But if we look up the pid, then there's no need
for nsend, as we've lost its benefit. We would need something like
nlink, nmonitor. And then there's the thorny issue of the registry
changing between calls.
So, if one got rid of nsend and one always had to do lookups anyway,
then it makes perfect sense to allow registering remote pids in a local
registry (and local in a remote).
How do erlang programmers use nsend in the first place? Is it only used
for "fire and forget" in cases where reliability is not needed but the
latency of an extra round trip is critical? Seems very niche.
On Tue, 2015-01-20 at 20:04 +0000, Tim Watson wrote:
My reading of the unified semantics is that if you nsend to a process
that is not registered then nothing happens, it's just discarded. It is
also clear that the node controller removes the pid from the registry
when it is notified that it has died. It looks like it will do that for
local processes and for remote pids in the local registry when it
receives a link/monitor notification about that pid. So that'd indicate
that local processes are removed promptly, and remote ones would only be
removed if/when the nc gets a notification about them.
However, in the unified semantics the node controller is only notified
eventually (asynchronously) of a local process exiting. So it's still
possible in that semantics to nsend to a local process that is still in
the registry but has exited.
So you say that erlang guarantees that it's an error to send to a local
registered process that has died, where as in the unifies semantics
that's a no-op (it's either discarded when the registry lookup fails, or
when delivering to the now-dead process). But erlang cannot give that
guarantee for a send to a remote node with a locally registered process
on that node. Presumably in that case erlang also just silently discards
the message? (And given that one cannot link or monitor, how is it
used?)
> As for monitoring/linking, I agree this is something that's missing. We
> /could/ change nsend so that it works transparently, but the
> monitoring/linking issue would need to be dealt with by setting up monitors
> for all registered remote pids and clearing pids from the registry each
> time they die. That's quite a lot of overhead!
Not just overhead but too much policy for this level I'd say.
> An alternative which I prefer, would be your suggestion of implementing
> nsend in terms of whereis and send instead. The problem with that however
> is that there is an obvious race in it, so it's unlikely to be used much.
> Another alternative that IMHO works well is to take the approach beam uses
> and refuse to register remote pids, since there are no monitoring overheads
> for unregistered local processes that die.
It's not clear to me that there is a race, we take a snapshot of the
registry and send to the process that was registered at that moment
(which may no longer be the processes registered, or it may no longer be
alive).
Given that we can do registry lookups, that's always an issue.
I honestly still don't see how anyone can use nsend or its erlang
equivalent in a sane way. Given that you cannot use linking or
monitoring with it the only way to use it seems to be to send
self-contained messages and where the sender waits for a reply with a
timeout (since no link or monitor).
So an issue with registering remote pids is knowing when to eventually
remove them. The unified semantics indicate that the nc removes them
when it gets a link/monitor message, but it's not guaranteed to get one
of those for a remote process.
> People that want remote registered pids and named send + monitoring of
> named pids can use my distributed-process-registry package, which provides
> exactly those features. It does of course incur all the obvious monitoring
> overheads, but use of it is then the choice of the programmer rather than
> something we impose on all users of the base distributed-process library.
Well, note that we'd only be paying for what we ask for. There's no
extra overhead if you stick to registering only local processes.
I'll try to summarise it.
1. Registry.
1.1 It should be possible to register remote processes in local
registry (and local processes in remote registry)
1.2 If a remote process is stored in registry then Node Controller
starts monitoring that process
1.3 If a process dies and we get asynchronous notification then we
remove process from registry
2. nsend
2.1 nsend should be implemented in terms of lookup (`whereis`) and send
2.2 nsend don't rise any exceptions regardless if process stored in
registry or not
Together this means that message send using nsend or remoteNSend is
not guaranteed to
be delivered, and we need to call restart if we got a connection
failure (as it's a property of send).
Also I a required properties of monitoring are an open question
1. should monitoring a registry entry automatically unmonitor old and
monitor new process then registry entry
is updated?
2. should we create a RegistryChanged monitor event?
3. if monitoring process should be updated on change then could we
have a races between
event's:
a. process that was registered dies (so we should see a ProcessDied event)
b. registry change (if this event occur before a then we should not
see ProcessDied event)
Do we have some sort of a way to create online document, or document
in repository, so we could
bump all required semantics that everybody agree on there?
After this discussion I've pushed (and Tim applied following changes to site):
2. Development wiki page, I think it may be used as a sandbox for discussing
changes to semantics:
http://haskell-distributed.github.io/wiki/development.html
I also bumped there what we have desided (?) about semantics. Monitoring
of the registry is uncovered for now.
I think that if no-one will have an objections for this plan than I can start
working on in on the next week. Current proposal for new rules (it's not change
as these topics were not specified in semantics) will solve the problem we
have started with. But it's still not a full specification for Registry. Some some
work is still required.