This seems like an outstanding idea. If I understand correctly, you could have the first node that starts up on a host also start the erlpmd application. If the node running erlpmd goes down, one of the other nodes on the same host starts the erlpmd application. Do I have this right?
Cheers,
DBM
Dmitry Demeshchuk <demes...@gmail.com> wrote:No, the point is rather the opposite - since this is always a local
>
>1. When we send ALIVE2_REQ and reply with ALIVE2_RESP, we establish a TCP
>connection. Closing of which is a signal of node disconnection. This
>approach does have a point, since we can use keep-alive and periodically
>check that the node is still here on the TCP level.
loopback connection, epmd is guaranteed (by the OS/kernel) to
"immediately" find out that the erlang node died (or disconnected), by
means of socket close (EOF) - no matter how the death came about. TCP
keep-alives, that by necessity incur a delay (and the default is
typically huge) before detection of a problem, are not only inferior but
pointless in this scenario.
--Per Hedeland
Hi,
BTW what about using SCTP for distribution protocol?
Would not bring it some advatages in high availability area? E.g. bacause of multihoming or posibility of setting timeout and retransmission parameters?
Best regards,
Tomas
From:
erlang-quest...@erlang.org [mailto:erlang-quest...@erlang.org]
On Behalf Of Dmitry Demeshchuk
Sent: Wednesday, November 07, 2012 8:03 AM
To: erlang-questions
Subject: [erlang-questions] Future of epmd
Hello, list.
We have discussed having epmd implemented in Erlang several times and think it
would be a good idea for several reasons especially if there is only one Erlang node per host. But it could work even if there are more nodes. It could also be an alternative to run a separate E-node just for the epmd service. We have unfortunately not been able to give this high enough priority yet, so this initiative
is interesting (have not looked at the code and other details)
The benefits with having epmd implemented in Erlang would be:
- Easier to maintain
- Easier to extend
- Easy to prototype other solutions, for example heterogenous distribution,
secure epmd communication via TLS , etc
The client part is already written in Erlang, see the erl_epmd module.
/Kenneth , Erlang/OTP Ericsson
Den 8 nov 2012 09:59 skrev "Kukosa, Tomas" <tomas....@siemens-enterprise.com>:
>
> Hi,
>
>
>
> BTW what about using SCTP for distribution protocol?
>
> Would not bring it some advatages in high availability area? E.g. bacause of multihoming or posibility of setting timeout and retransmission parameters?
Yes it would be interesting to have distribution over SCTP, and it is possible to implement with the same plugin approach as the distro over SSL is implemented.
This might have implications on epmd as well and maybe heterogenous distribution
would be of interest as well.
With heterogenous distribution I mean that a node can talk sctp with some other node and talk tcp with yet another. It would require some negotiation and or registration in an extended epmd where a node can say which protocols it supports and prefers.
/Kenneth, Erlang/OTP Ericsson
Hello, list.
As you may know, epmd may sometimes misbehave. Loses nodes and doesn't add them back, for example (unless you do some magic, like this: http://sidentdv.livejournal.com/769.html ).
A while ago, Peter Lemenkov got a wonderful idea that epmd may be actually written in Erlang instead. EPMD protocol is very simple, and it's much easier to implement all the failover scenarios in Erlang than in C. So far, here's a prototype of his: https://github.com/lemenkov/erlpmd
When hacking it, I've noticed several things:
1. When we send ALIVE2_REQ and reply with ALIVE2_RESP, we establish a TCP connection. Closing of which is a signal of node disconnection. This approach does have a point, since we can use keep-alive and periodically check that the node is still here on the TCP level. But next, some weird thing follows:
2. When we send other control messages from a node connected to epmd, we establish a new TCP connection, each time. Could use the main connection instead. Was it a design decision or it's just a legacy thing?
3. The client (node) part of epmd seems to be all implemented in C and sealed inside ERTS. However, it seems like this code could be implemented inside the net_kernel module instead (or something similar).
Why bother and switch to Erlang when everything is already written and working? First of all, sometimes it doesn't really work in big clusters (see my first link). And, secondly, using Erlang we can easily extend the protocol. For example, add auto-discovery feature, which has been discussed on the list a lot. Add an ability for a node to reconnect if its TCP session has been terminated for some reason. Add lookups of nodes by prefix (like, "give me all nodes that match mynode@*"). The list can be probably extended further.
Do you think such a thing (with full backwards compatibility, of course) could go upstream? Also, a question for epmd maintainers: is it going to change at all, or the protocol is considered to be full enough for its purposes?
--
Best regards,
Dmitry Demeshchuk