[erlang-questions] crash dump at ejabberd startup

418 views
Skip to first unread message

t...@diogunix.com

unread,
Nov 16, 2010, 5:14:01 AM11/16/10
to erlang-q...@erlang.org
Hello everybody in the Erlang universe,

I got stuck with a problem getting ejabberd to run in a FreeBSD8 Jail.
Erlang and ejabberd were built using the Ports Collection (this week).

As far as I could learn from googling all this, the issue goes back to the
erlang environment and abviously has something to do with "inet", TCP etc..
Well, even I know a bit about Unix I can't help myself with the error
messages erlang spits out (no clue what these messages actually / precisely
want to tell me):

ejabberd Start:

# ejabberdctl status
{error_logger,{{2010,11,16},{9,56,20}},"Protocol: ~p: register error: ~p~n",
["inet_tcp",{{badmatch,{error,epmd_close}},[{inet_tcp_dist,listen,1},
{net_kernel,start_protos,4},{net_kernel,start_protos,3},
{net_kernel,init_node,2},{net_kernel,init,1},{gen_server,init_it,6},
{proc_lib,init_p_do_apply,3}]}]}
{error_logger,{{2010,11,16},{9,56,20}},crash_report,[[{initial_call,
{net_kernel,init,['Argument__1']}},{pid,<0.19.0>},{registered_name,[]},
{error_info,{exit,{error,badarg},[{gen_server,init_it,6},
{proc_lib,init_p_do_apply,3}]}},{ancestors,[net_sup,kernel_sup,<0.9.0>]},
{messages,[]},{links,[#Port<0.101>,<0.16.0>]},{dictionary,
[{longnames,false}]},{trap_exit,true},{status,running},{heap_size,610},
{stack_size,24},{reductions,489}],[]]}
{error_logger,{{2010,11,16},{9,56,20}},supervisor_report,[{supervisor,
{local,net_sup}},{errorContext,start_error},{reason,
{'EXIT',nodistribution}},{offender,[{pid,undefined},{name,net_kernel},
{mfargs,{net_kernel,start_link,[['ctl-17-ejabberd@localhost',shortnames]]}},
{restart_type,permanent},{shutdown,2000},{child_type,worker}]}]}
{error_logger,{{2010,11,16},{9,56,20}},supervisor_report,[{supervisor,
{local,kernel_sup}},{errorContext,start_error},{reason,shutdown},{offender,
[{pid,undefined},{name,net_sup},{mfargs,{erl_distribution,start_link,[]}},
{restart_type,permanent},{shutdown,infinity},{child_type,supervisor}]}]}
{error_logger,{{2010,11,16},{9,56,20}},std_info,[{application,kernel},
{exited,{shutdown,{kernel,start,[normal,[]]}}},{type,permanent}]}
{"Kernel pid
terminated",application_controller,"{application_start_failure,kernel,
{shutdown,{kernel,start,[normal,[]]}}}"}

Crash dump was written to: /var/log/ejabberd/erl_crash_20101116-095620.dump
Kernel pid terminated (application_controller)
({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}})

Well, I wouldn't be surprised if all this had to do with the mysterious
"inetrc" configuration file. The example from the FreeBSD port contains:

# cat inetrc
{lookup,["file","native"]}.
{host,{127,0,0,1}, ["localhost","hostalias"]}.
{file, resolv, "/etc/resolv.conf"}.

I already tried some hints from what could be found via Google - but
nonetheless didn't succeed or even made progress.

The funny thing is, I once had ejabberd running as a test system in another
FreeBSD Jail but unfortunately did not note all details before wiping that
old host.

So, anybody out there capable to give any hints ?
Understanding the problem might be 90% of getting the solution ...

Many thanks in advance !

kind regards
Tom


________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questio...@erlang.org

Michael Santos

unread,
Nov 16, 2010, 7:59:15 AM11/16/10
to t...@diogunix.com, erlang-q...@erlang.org
On Tue, Nov 16, 2010 at 11:14:01AM +0100, t...@diogunix.com wrote:
> Hello everybody in the Erlang universe,
>
> I got stuck with a problem getting ejabberd to run in a FreeBSD8 Jail.
> Erlang and ejabberd were built using the Ports Collection (this week).
>
> As far as I could learn from googling all this, the issue goes back to the
> erlang environment and abviously has something to do with "inet", TCP etc..
> Well, even I know a bit about Unix I can't help myself with the error
> messages erlang spits out (no clue what these messages actually / precisely
> want to tell me):
>
> ejabberd Start:
>
> # ejabberdctl status
> {error_logger,{{2010,11,16},{9,56,20}},"Protocol: ~p: register error: ~p~n",
> ["inet_tcp",{{badmatch,{error,epmd_close}},[{inet_tcp_dist,listen,1},

When a distributed Erlang node starts up, it attempts to register itself
with an epmd daemon listening on localhost on port 4369 and starts one,
if an epmd is not running.

Check epmd is running and if the node is allowed to contact it from the
jail. You can get debugging info by running epmd manually: epmd -d -d -d

t...@diogunix.com

unread,
Nov 16, 2010, 3:36:49 PM11/16/10
to erlang-q...@erlang.org
Hello Michael,

thaniks for helping :-)

> > ejabberd Start:
> >
> > # ejabberdctl status
> > {error_logger,{{2010,11,16},{9,56,20}},"Protocol: ~p: register error:
> > ~p~n",
> > ["inet_tcp",{{badmatch,{error,epmd_close}},[{inet_tcp_dist,listen,1},
>
> When a distributed Erlang node starts up, it attempts to register itself
> with an epmd daemon listening on localhost on port 4369 and starts one,
> if an epmd is not running.

I have a process running by unix user "ejabberd", yes:
# ps axu
ejabberd 2355 0.0 0.1 3448 1284 ?? SJ 5:54AM 0:00.14
/usr/local/lib/erlang/erts-5.8.1/bin/epmd -daemon

I guess, this process was started by the ejabberdctl skript when it tried to
launch ejabberd.

EPMD tries to connect to "localhost" ? Then we are on the right track now I
guess FreeBSD jails have only sort of a "half own localhost" and thus one
has to configure the Jails IP address instead of localhost / 127.0.0.1
instead (what usually works for all other daemons).

Where can I configure Erlang to use the Jails IP instead localhost/127.0.0.1
? In that mysterious "inetrc" config file ?

However, there must be a way as I earlier had it running already in another
test host (half a year ago). Just cannot remember the details on how I did
it.



> Check epmd is running and if the node is allowed to contact it from the
> jail. You can get debugging info by running epmd manually: epmd -d -d -d

Ok, did that as
# kill -9 2355
# /usr/local/lib/erlang/erts-5.8.1/bin/epmd -d -d -d
epmd: Tue Nov 16 20:29:59 2010: epmd running - daemon = 0
epmd: Tue Nov 16 20:29:59 2010: try to initiate listening port 4369
epmd: Tue Nov 16 20:29:59 2010: starting
epmd: Tue Nov 16 20:29:59 2010: entering the main select() loop
epmd: Tue Nov 16 20:30:04 2010: time in seconds: 1289939404
epmd: Tue Nov 16 20:30:09 2010: time in seconds: 1289939409
epmd: Tue Nov 16 20:30:14 2010: time in seconds: 1289939414
epmd: Tue Nov 16 20:30:20 2010: time in seconds: 1289939420
epmd: Tue Nov 16 20:30:25 2010: time in seconds: 1289939425
epmd: Tue Nov 16 20:30:30 2010: time in seconds: 1289939430
epmd: Tue Nov 16 20:30:35 2010: time in seconds: 1289939435
epmd: Tue Nov 16 20:30:40 2010: time in seconds: 1289939440
...
...
^C

What could be the next step to test/check ?

kind regards
Tom

t...@diogunix.com

unread,
Nov 16, 2010, 4:08:47 PM11/16/10
to erlang-q...@erlang.org
Just an additional thought:

> > > ejabberd Start:
> > >
> > > # ejabberdctl status
> > > {error_logger,{{2010,11,16},{9,56,20}},"Protocol: ~p: register error:
> > > ~p~n",
> > > ["inet_tcp",{{badmatch,{error,epmd_close}},[{inet_tcp_dist,listen,1},

is it sure the error message above means the connection between erlang and
localhost:4369 ?

I also built and configured ejabberd to use MySQL and may be the connection
between erlangs mysql driver and mysql could be meant ? MySQL however is
happily running and already in use by Postfix and Dovecot in the same jail
...

Michael Santos

unread,
Nov 16, 2010, 6:22:02 PM11/16/10
to t...@diogunix.com, erlang-q...@erlang.org
On Tue, Nov 16, 2010 at 09:36:49PM +0100, t...@diogunix.com wrote:

> I have a process running by unix user "ejabberd", yes:
> # ps axu
> ejabberd 2355 0.0 0.1 3448 1284 ?? SJ 5:54AM 0:00.14
> /usr/local/lib/erlang/erts-5.8.1/bin/epmd -daemon
>
> I guess, this process was started by the ejabberdctl skript when it tried to
> launch ejabberd.
>
> EPMD tries to connect to "localhost" ?

The Erlang node connects to epmd (on 127.0.0.1:4369). The TCP connection
is working, but the socket is closed immmediately.

> Then we are on the right track now I
> guess FreeBSD jails have only sort of a "half own localhost" and thus one
> has to configure the Jails IP address instead of localhost / 127.0.0.1
> instead (what usually works for all other daemons).

Which version of Erlang are you using? R14B?

epmd in R14B was changed to allow some messages (like name registrations)
only from 127/8.

> Where can I configure Erlang to use the Jails IP instead localhost/127.0.0.1

> ? In that mysterious "inetrc" config file ?

inetrc is used for hostname resolution. See:

http://www.erlang.org/doc/apps/erts/inet_cfg.html

> However, there must be a way as I earlier had it running already in another
> test host (half a year ago). Just cannot remember the details on how I did
> it.
>
> > Check epmd is running and if the node is allowed to contact it from the
> > jail. You can get debugging info by running epmd manually: epmd -d -d -d
>
> Ok, did that as
> # kill -9 2355
> # /usr/local/lib/erlang/erts-5.8.1/bin/epmd -d -d -d
> epmd: Tue Nov 16 20:29:59 2010: epmd running - daemon = 0
> epmd: Tue Nov 16 20:29:59 2010: try to initiate listening port 4369
> epmd: Tue Nov 16 20:29:59 2010: starting
> epmd: Tue Nov 16 20:29:59 2010: entering the main select() loop
> epmd: Tue Nov 16 20:30:04 2010: time in seconds: 1289939404
> epmd: Tue Nov 16 20:30:09 2010: time in seconds: 1289939409
> epmd: Tue Nov 16 20:30:14 2010: time in seconds: 1289939414
> epmd: Tue Nov 16 20:30:20 2010: time in seconds: 1289939420
> epmd: Tue Nov 16 20:30:25 2010: time in seconds: 1289939425
> epmd: Tue Nov 16 20:30:30 2010: time in seconds: 1289939430
> epmd: Tue Nov 16 20:30:35 2010: time in seconds: 1289939435
> epmd: Tue Nov 16 20:30:40 2010: time in seconds: 1289939440
> ...
> ...
> ^C

Was epmd started up inside the jail? Did you bring up ejabberd as well?
I don't see any registration attempts.

> ["inet_tcp",{{badmatch,{error,epmd_close}},[{inet_tcp_dist,listen,1},

> {'EXIT',nodistribution}},{offender,[{pid,undefined},{name,net_kernel},

> is it sure the error message above means the connection between erlang
> and localhost:4369 ?

Just a guess :)

Here is the error message:

{badmatch,{error,epmd_close}}

When the ejabberd node starts up, it connects to 127.0.0.1:4369 and sends
a EPMD_ALIVE2_REQ message to register its name and distribution port (an
ephemeral port).

The code that does this is in inet_tcp_dist:listen/1 which calls
erl_epmd:register_node/2.

{ok, Creation} = erl_epmd:register_node(Name, Port)

epmd closes the connection immediately (possibly because the
EPMD_ALIVE2_REQ message is not allowed) and register_node/2 returns
{error, epmd_close}, causing the badmatch.

You can get the same error by starting up 2 erlang nodes (kill epmd if
it is running):

$ erl
Erlang R14B01 (erts-5.8.2) [source] [smp:2:2] [rq:2] [async-threads:0]
[hipe] [kernel-poll:false]

Eshell V5.8.2 (abort with ^G)
1> {ok, L} = gen_tcp:listen(4369, [{active, false}]).
2> {ok, S} = gen_tcp:accept(L), ok = gen_tcp:close(S).

And in another shell:

$ erl -name test
{error_logger,{{2010,11,16},{18,12,6}},"Protocol: ~p: register error: ~p~n",["inet_tcp",{{badmatch,{error,epmd_close}},[{inet_tcp_dist,listen,1},{net_kernel,start_protos,4},{net_kernel,start_protos,3},{net_kernel,init_node,2},{net_kernel,init,1},{gen_server,init_it,6},{proc_lib,init_p_do_apply,3}]}]}

t...@diogunix.com

unread,
Nov 16, 2010, 6:45:34 PM11/16/10
to erlang-q...@erlang.org
Hello,

just as a quick feedback:

> Which version of Erlang are you using? R14B?
>
> epmd in R14B was changed to allow some messages (like name registrations)
> only from 127/8.

as unix user "ejabberd":

$ erl
Erlang R14B (erts-5.8.1) [source] [smp:4:4] [rq:4] [async-threads:0] [hipe]
[kernel-poll:false]

Eshell V5.8.1 (abort with ^G)

So, does that mean, there's only a problem with this certain version ?
Then we had the explanation why it worked with an older version half a year
ago but not with the current version.


> > Where can I configure Erlang to use the Jails IP instead
> > localhost/127.0.0.1
> >
> > ? In that mysterious "inetrc" config file ?
>
> inetrc is used for hostname resolution. See:
>
> http://www.erlang.org/doc/apps/erts/inet_cfg.html

thank you very much, will immediately digg into this and then check back to
you.


> Was epmd started up inside the jail? Did you bring up ejabberd as well?
> I don't see any registration attempts.

Everything happened in this Jail only:
- building erlang, ejabberd, erlang-mysql driver
- starting ejabberd (trying to start it)
- all other command lines and resuling output I sent to this list so far


On the localhost pre-configuration of erlang/epmd:

Do I have an option to change that behaviour and let erlang/epmd try to
connect to the actual IP ? This wouldn't harm anything as anyway there's a
tight firewall protecting access from outside the jail.

May be the answer on this question will come out from reading the inetrc
links you sent me but I did not already go through it. Will do that tonight.

kind regards
Tom

t...@diogunix.com

unread,
Nov 16, 2010, 8:12:25 PM11/16/10
to erlang-q...@erlang.org
Hello Michael,

another suggestion on what could have caused the issue and how to resolv it.
The idea came into my mind while reading
http://www.erlang.org/doc/apps/erts/inet_cfg.html
and when remembering there was another person having a similar problem,
which that person could resolve by adding a proper hostname to it's FreeBSD
jail.

Well, I of course HAVE a proper hostname configuration for this jail. But
may be, epmd just cannot find the hostname as it - by default - does not use
the regular methods ?

This might be an explanation why the connection at first was established but
then closed immediately.

if true, the issue almost probably can be resolved by just editing the
inetrc file and make it use the correct methods for finding the hostname.

Ok, all this is wild speculation by a non erlang insider but however, here's
the standard inetrc file as delivered by FreeBSD's Ports Collection:

# cat inetrc.example


{lookup,["file","native"]}.
{host,{127,0,0,1}, ["localhost","hostalias"]}.
{file, resolv, "/etc/resolv.conf"}.

This example-inetrc might be working for a standard FreeBSD host but may be
needs to get adapted for a FreeBSD jail. I'm just unsure on how to change
the values there (as I do not understand one word of Erlang and am not sure
I got it right from your link).

kind regards
Tom

Michael Santos

unread,
Nov 16, 2010, 9:08:47 PM11/16/10
to t...@diogunix.com, erlang-q...@erlang.org
On Wed, Nov 17, 2010 at 02:12:25AM +0100, t...@diogunix.com wrote:
> Hello Michael,
>
> another suggestion on what could have caused the issue and how to resolv it.
> The idea came into my mind while reading
> http://www.erlang.org/doc/apps/erts/inet_cfg.html
> and when remembering there was another person having a similar problem,
> which that person could resolve by adding a proper hostname to it's FreeBSD
> jail.

It might help. I think the quickest way to debug this would be to:

1. Start up epmd manually in the jail with the debug switches

epmd -d -d -d

2. Run tcpdump on port 4369 for the interfaces in the jail

3. Start up a distributed erlang node by hand:

erl -name foo

Report back on what you find!

t...@diogunix.com

unread,
Nov 16, 2010, 10:59:52 PM11/16/10
to erlang-q...@erlang.org

> It might help. I think the quickest way to debug this would be to:

donesults below ...


> 1. Start up epmd manually in the jail with the debug switches
>
> epmd -d -d -d

I did that already before but repeated it nonetheless.

# epmd -d -d -d
epmd: Wed Nov 17 03:27:57 2010: epmd running - daemon = 0
epmd: Wed Nov 17 03:27:57 2010: try to initiate listening port 4369
epmd: Wed Nov 17 03:27:57 2010: starting
epmd: Wed Nov 17 03:27:57 2010: entering the main select() loop
epmd: Wed Nov 17 03:28:02 2010: time in seconds: 1289964482
epmd: Wed Nov 17 03:28:07 2010: time in seconds: 1289964487
epmd: Wed Nov 17 03:28:12 2010: time in seconds: 1289964492
epmd: Wed Nov 17 03:28:17 2010: time in seconds: 1289964497
epmd: Wed Nov 17 03:28:22 2010: time in seconds: 1289964502
^C

You before mentioned, it did not even try to connect
(can''t say anything about this).


> 2. Run tcpdump on port 4369 for the interfaces in the jail

# tcpdump -vv port 4369
tcpdump: listening on em0, link-type EN10MB (Ethernet), capture size 96
bytes

tcpdump doesn't get anything when listening on the em0 interface.
So, I tried

# tcpdump -i lo0 -vv port 4369
tcpdump: WARNING: lo0: no IPv4 address assigned
tcpdump: listening on lo0, link-type NULL (BSD loopback), capture size 96
bytes


> 3. Start up a distributed erlang node by hand:
>
> erl -name foo

# erl -name foo
{error_logger,{{2010,11,17},{3,33,15}},"Protocol: ~p: register error: ~p~n",
["inet_tcp",{{badmatch,{error,epmd_close}},[{inet_tcp_dist,listen,1},
{net_kernel,start_protos,4},{net_kernel,start_protos,3},
{net_kernel,init_node,2},{net_kernel,init,1},{gen_server,init_it,6},
{proc_lib,init_p_do_apply,3}]}]}
{error_logger,{{2010,11,17},{3,33,15}},crash_report,[[{initial_call,


{net_kernel,init,['Argument__1']}},{pid,<0.19.0>},{registered_name,[]},
{error_info,{exit,{error,badarg},[{gen_server,init_it,6},
{proc_lib,init_p_do_apply,3}]}},{ancestors,[net_sup,kernel_sup,<0.9.0>]},

{messages,[]},{links,[#Port<0.56>,<0.16.0>]},{dictionary,
[{longnames,true}]},{trap_exit,true},{status,running},{heap_size,377},
{stack_size,24},{reductions,443}],[]]}
{error_logger,{{2010,11,17},{3,33,15}},supervisor_report,[{supervisor,


{local,net_sup}},{errorContext,start_error},{reason,

{'EXIT',nodistribution}},{offender,[{pid,undefined},{name,net_kernel},

{mfargs,{net_kernel,start_link,[[foo,longnames]]}},{restart_type,permanent},
{shutdown,2000},{child_type,worker}]}]}
{error_logger,{{2010,11,17},{3,33,15}},supervisor_report,[{supervisor,


{local,kernel_sup}},{errorContext,start_error},{reason,shutdown},{offender,
[{pid,undefined},{name,net_sup},{mfargs,{erl_distribution,start_link,[]}},
{restart_type,permanent},{shutdown,infinity},{child_type,supervisor}]}]}

{error_logger,{{2010,11,17},{3,33,15}},std_info,[{application,kernel},


{exited,{shutdown,{kernel,start,[normal,[]]}}},{type,permanent}]}
{"Kernel pid
terminated",application_controller,"{application_start_failure,kernel,
{shutdown,{kernel,start,[normal,[]]}}}"}

Crash dump was written to: erl_crash.dump


Kernel pid terminated (application_controller)
({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}})


and tcpdump output:

03:33:15.964680 IP (tos 0x0, ttl 64, id 24389, offset 0, flags [DF], proto
TCP (6), length 60, bad cksum 0 (->4f37)!)
mail.kepos.org.10975 > mail.kepos.org.4369: Flags [S], cksum 0xd692
(correct), seq 3123827793, win 65535, options [mss 16344,nop,wscale
3,sackOK,TS val 77847542 ecr 0], length 0
03:33:15.964701 IP (tos 0x0, ttl 64, id 24390, offset 0, flags [DF], proto
TCP (6), length 60, bad cksum 0 (->4f36)!)
mail.kepos.org.4369 > mail.kepos.org.10975: Flags [S.], cksum 0x776a
(correct), seq 3252621940, ack 3123827794, win 65535, options [mss
16344,nop,wscale 3,sackOK,TS val 2304704868 ecr 77847542], length 0
03:33:15.964712 IP (tos 0x0, ttl 64, id 24391, offset 0, flags [DF], proto
TCP (6), length 52, bad cksum 0 (->4f3d)!)
mail.kepos.org.10975 > mail.kepos.org.4369: Flags [.], cksum 0xbd56
(correct), seq 1, ack 1, win 8960, options [nop,nop,TS val 77847542 ecr
2304704868], length 0
03:33:15.964740 IP (tos 0x0, ttl 64, id 24392, offset 0, flags [DF], proto
TCP (6), length 70, bad cksum 0 (->4f2a)!)
mail.kepos.org.10975 > mail.kepos.org.4369: Flags [P.], cksum 0x6619
(correct), seq 1:19, ack 1, win 8960, options [nop,nop,TS val 77847542 ecr
2304704868], length 18
03:33:15.964896 IP (tos 0x0, ttl 64, id 24393, offset 0, flags [DF], proto
TCP (6), length 52, bad cksum 0 (->4f3b)!)
mail.kepos.org.4369 > mail.kepos.org.10975: Flags [F.], cksum 0xbd43
(correct), seq 1, ack 19, win 8960, options [nop,nop,TS val 2304704868 ecr
77847542], length 0
03:33:15.964906 IP (tos 0x0, ttl 64, id 24394, offset 0, flags [DF], proto
TCP (6), length 52, bad cksum 0 (->4f3a)!)
mail.kepos.org.10975 > mail.kepos.org.4369: Flags [.], cksum 0xbd43
(correct), seq 19, ack 2, win 8960, options [nop,nop,TS val 77847542 ecr
2304704868], length 0
03:33:15.964929 IP (tos 0x0, ttl 64, id 24395, offset 0, flags [DF], proto
TCP (6), length 52, bad cksum 0 (->4f39)!)
mail.kepos.org.10975 > mail.kepos.org.4369: Flags [F.], cksum 0xbd42
(correct), seq 19, ack 2, win 8960, options [nop,nop,TS val 77847542 ecr
2304704868], length 0
03:33:15.964937 IP (tos 0x0, ttl 64, id 24396, offset 0, flags [DF], proto
TCP (6), length 52, bad cksum 0 (->4f38)!)
mail.kepos.org.4369 > mail.kepos.org.10975: Flags [.], cksum 0xbd43
(correct), seq 2, ack 20, win 8959, options [nop,nop,TS val 2304704868 ecr
77847542], length 0
^C
8 packets captured
41 packets received by filter
0 packets dropped by kernel


Juust to exde any misconfiguation on the machine I allso tried itt again
switching off the firewall for this test: same result.


> Report back on what you find!

I fear there's no real enlightenment from this - at least not for me. We
already knew that we cannot make full use of the loopback interface
regarding the virtualization situation in a FreeBSD jail.

Also, I still do not know what actually it is, Erlang does not like with
it's environment.

Nonetheless there must be a way to get it up as there are people out there
capable to do it.

I'd like to again ask whether there was no way to make Erlang talk with the
actual IP instead off talking to the localhost.

And stll I'm unsure on how to deal with the inetrc file. What I read from
your link (Michael) I find a bit unsual and confusing.

kind regards
Tom

t...@diogunix.com

unread,
Nov 17, 2010, 8:43:44 AM11/17/10
to erlang-q...@erlang.org
I meanwhile also made some test with editing the intrc file as I have gone
through the Erlang docs regarding inetrc in the mean time:

No success unfortunately.

Additional information from the FreeBSD questions mailing list on the Erlang
14B issue:

> I believe we are facing the same issue. Our setup is ejabberd-2.1.5 in
> a FreeBSD 7.3 amd64 jail and the workaround we found is running an
> older version of erlang (erlang-r13b04_3,1). You can find the old
> erlang port files in the FreeBSD CVS repository or here:

So, even I cannot be already sure here, the issue might be caused through

the changes in Erlang 14B as Michael said:

> epmd in R14B was changed to allow some messages (like name registrations)
> only from 127/8.

If true, one might consider to make Erlang configurable regarding the IP
address.

On the short run, a second Port for the FreeBSD Ports Collection (using an
older Erlang version) could ensure Erlang can be made available in FreeBSD
Jails.

Any other ideas / hints of course are still welcome.

kind regards
Tom

Michael Santos

unread,
Nov 17, 2010, 9:48:54 AM11/17/10
to t...@diogunix.com, erlang-q...@erlang.org
On Wed, Nov 17, 2010 at 04:59:52AM +0100, t...@diogunix.com wrote:

> > It might help. I think the quickest way to debug this would be to:
>
> donesults below ...

Did you run these steps concurrently or sequentially? e.g., when you
brought up the erlang node, was the debug epmd running?

epmd (with debug switches) needs to be running in one shell, the tcpdump's
(with the "-n" switch) in another. Then start up the Erlang node.

> # tcpdump -i lo0 -vv port 4369
> tcpdump: WARNING: lo0: no IPv4 address assigned
> tcpdump: listening on lo0, link-type NULL (BSD loopback), capture size 96
> bytes

Was the output below from the dump of the loopback?

> and tcpdump output:
>
> 03:33:15.964680 IP (tos 0x0, ttl 64, id 24389, offset 0, flags [DF], proto
> TCP (6), length 60, bad cksum 0 (->4f37)!)
> mail.kepos.org.10975 > mail.kepos.org.4369: Flags [S], cksum 0xd692
> (correct), seq 3123827793, win 65535, options [mss 16344,nop,wscale
> 3,sackOK,TS val 77847542 ecr 0], length 0
> 03:33:15.964701 IP (tos 0x0, ttl 64, id 24390, offset 0, flags [DF], proto
> TCP (6), length 60, bad cksum 0 (->4f36)!)
> mail.kepos.org.4369 > mail.kepos.org.10975: Flags [S.], cksum 0x776a
> (correct), seq 3252621940, ack 3123827794, win 65535, options [mss
> 16344,nop,wscale 3,sackOK,TS val 2304704868 ecr 77847542], length 0
> 03:33:15.964712 IP (tos 0x0, ttl 64, id 24391, offset 0, flags [DF], proto
> TCP (6), length 52, bad cksum 0 (->4f3d)!)

> mail.kepos.org.4369 > mail.kepos.org.10975: Flags [F.], cksum 0xbd43

> (correct), seq 1, ack 19, win 8960, options [nop,nop,TS val 2304704868 ecr
> 77847542], length 0
> 03:33:15.964906 IP (tos 0x0, ttl 64, id 24394, offset 0, flags [DF], proto
> TCP (6), length 52, bad cksum 0 (->4f3a)!)
> mail.kepos.org.10975 > mail.kepos.org.4369: Flags [.], cksum 0xbd43
> (correct), seq 19, ack 2, win 8960, options [nop,nop,TS val 77847542 ecr
> 2304704868], length 0
> 03:33:15.964929 IP (tos 0x0, ttl 64, id 24395, offset 0, flags [DF], proto
> TCP (6), length 52, bad cksum 0 (->4f39)!)
> mail.kepos.org.10975 > mail.kepos.org.4369: Flags [F.], cksum 0xbd42
> (correct), seq 19, ack 2, win 8960, options [nop,nop,TS val 77847542 ecr
> 2304704868], length 0
> 03:33:15.964937 IP (tos 0x0, ttl 64, id 24396, offset 0, flags [DF], proto
> TCP (6), length 52, bad cksum 0 (->4f38)!)
> mail.kepos.org.4369 > mail.kepos.org.10975: Flags [.], cksum 0xbd43
> (correct), seq 2, ack 20, win 8959, options [nop,nop,TS val 2304704868 ecr
> 77847542], length 0
> ^C

Do not resolve IP addresses. We need to see the source and destination
IP addresses.

So something on port 4369 (on whatever "mail.kepos.org" is) is accepting
and closing the connection. If it is not epmd (because there is nothing
in your debug log), what is it?

> Juust to exde any misconfiguation on the machine I allso tried itt again
> switching off the firewall for this test: same result.

Always a good idea.

t...@diogunix.com

unread,
Nov 18, 2010, 2:55:40 PM11/18/10
to erlang-q...@erlang.org, Michael Santos
Hello Michael,

sorry for the delay, I was off the office for one day.

Ok, to be precisely I repeated the test:

I first started tcpdum and let it run.
I then started epmd -d -d -d in a second shell.
I finally started erl -name foo in a third shell.

epdm -d -d -d now spit out something more informative:

# epmd -d -d -d

epmd: Thu Nov 18 19:26:33 2010: epmd running - daemon = 0
epmd: Thu Nov 18 19:26:33 2010: try to initiate listening port 4369
epmd: Thu Nov 18 19:26:33 2010: starting
epmd: Thu Nov 18 19:26:33 2010: entering the main select() loop
epmd: Thu Nov 18 19:26:38 2010: time in seconds: 1290108398
epmd: Thu Nov 18 19:26:43 2010: time in seconds: 1290108403
epmd: Thu Nov 18 19:26:44 2010: Non-local peer connected
epmd: Thu Nov 18 19:26:44 2010: time in seconds: 1290108404
epmd: Thu Nov 18 19:26:44 2010: opening connection on file descriptor 4
epmd: Thu Nov 18 19:26:44 2010: time in seconds: 1290108404
epmd: Thu Nov 18 19:26:44 2010: got 18 bytes
***** 00000000 00 10 78 7c e5 4d 00 00 05 00 05 00 03 66 6f 6f
|..x|.M.......foo|
***** 00000010 00 00 |..|
epmd: Thu Nov 18 19:26:44 2010: time in seconds: 1290108404
epmd: Thu Nov 18 19:26:44 2010: ** got ALIVE2_REQ
epmd: Thu Nov 18 19:26:44 2010: ALIVE2_REQ from non local address
epmd: Thu Nov 18 19:26:44 2010: closing connection on file descriptor 4
epmd: Thu Nov 18 19:26:49 2010: time in seconds: 1290108409
epmd: Thu Nov 18 19:26:54 2010: time in seconds: 1290108414
epmd: Thu Nov 18 19:26:59 2010: time in seconds: 1290108419

The inetrc file I was using was:

{lookup,[file, dns]}.
{host,{64,120,5,168}, ["mail.kepos.org"]}.
{file, resolv, "/etc/resolv.conf"}.


I then tried the same with the following inetrc version:

{lookup,[file, dns]}.
{host,{127,0,0,1}, ["localhost"]}.
{file, resolv, "/etc/resolv.conf"}.


# epmd -d -d -d

epmd: Thu Nov 18 19:33:50 2010: epmd running - daemon = 0
epmd: Thu Nov 18 19:33:50 2010: try to initiate listening port 4369
epmd: Thu Nov 18 19:33:50 2010: starting
epmd: Thu Nov 18 19:33:50 2010: entering the main select() loop
epmd: Thu Nov 18 19:33:55 2010: time in seconds: 1290108835
epmd: Thu Nov 18 19:34:00 2010: time in seconds: 1290108840
epmd: Thu Nov 18 19:34:01 2010: Non-local peer connected
epmd: Thu Nov 18 19:34:01 2010: time in seconds: 1290108841
epmd: Thu Nov 18 19:34:01 2010: opening connection on file descriptor 4
epmd: Thu Nov 18 19:34:01 2010: time in seconds: 1290108841
epmd: Thu Nov 18 19:34:01 2010: got 18 bytes
***** 00000000 00 10 78 3a f4 4d 00 00 05 00 05 00 03 66 6f 6f
|..x:.M.......foo|
***** 00000010 00 00 |..|
epmd: Thu Nov 18 19:34:01 2010: time in seconds: 1290108841
epmd: Thu Nov 18 19:34:01 2010: ** got ALIVE2_REQ
epmd: Thu Nov 18 19:34:01 2010: ALIVE2_REQ from non local address
epmd: Thu Nov 18 19:34:01 2010: closing connection on file descriptor 4
epmd: Thu Nov 18 19:34:06 2010: time in seconds: 1290108846
epmd: Thu Nov 18 19:34:11 2010: time in seconds: 1290108851
^C

Well, the expert are you, not me, but doesn't look this as epmd does not
like to get the connection from a "non-local" address ?


tcpdump in both cases did not put out anything helpful
(see also my earlier posting).

# tcpdump -i lo0 -vv port 4369
tcpdump: WARNING: lo0: no IPv4 address assigned
tcpdump: listening on lo0, link-type NULL (BSD loopback), capture size 96
bytes

19:34:01.445154 IP (tos 0x0, ttl 64, id 42836, offset 0, flags [DF], proto
TCP (6), length 60, bad cksum 0 (->728)!)
mail.58928 > mail.4369: Flags [S], cksum 0x1c18 (correct), seq
1795656002, win 65535, options [mss 16344,nop,wscale 3,sackOK,TS val
221277388 ecr 0], length 0
19:34:01.445174 IP (tos 0x0, ttl 64, id 42837, offset 0, flags [DF], proto
TCP (6), length 60, bad cksum 0 (->727)!)
mail.4369 > mail.58928: Flags [S.], cksum 0xab70 (correct), seq
3890252666, ack 1795656003, win 65535, options [mss 16344,nop,wscale
3,sackOK,TS val 1967491061 ecr 221277388], length 0
19:34:01.445186 IP (tos 0x0, ttl 64, id 42838, offset 0, flags [DF], proto
TCP (6), length 52, bad cksum 0 (->72e)!)
mail.58928 > mail.4369: Flags [.], cksum 0xf15c (correct), seq 1, ack 1,
win 8960, options [nop,nop,TS val 221277388 ecr 1967491061], length 0
19:34:01.445214 IP (tos 0x0, ttl 64, id 42839, offset 0, flags [DF], proto
TCP (6), length 70, bad cksum 0 (->71b)!)
mail.58928 > mail.4369: Flags [P.], cksum 0x07d5 (correct), seq 1:19,
ack 1, win 8960, options [nop,nop,TS val 221277388 ecr 1967491061], length
18
19:34:01.445357 IP (tos 0x0, ttl 64, id 42846, offset 0, flags [DF], proto
TCP (6), length 52, bad cksum 0 (->726)!)
mail.4369 > mail.58928: Flags [F.], cksum 0xf149 (correct), seq 1, ack
19, win 8960, options [nop,nop,TS val 1967491061 ecr 221277388], length 0
19:34:01.445368 IP (tos 0x0, ttl 64, id 42847, offset 0, flags [DF], proto
TCP (6), length 52, bad cksum 0 (->725)!)
mail.58928 > mail.4369: Flags [.], cksum 0xf149 (correct), seq 19, ack
2, win 8960, options [nop,nop,TS val 221277388 ecr 1967491061], length 0
19:34:01.445391 IP (tos 0x0, ttl 64, id 42850, offset 0, flags [DF], proto
TCP (6), length 52, bad cksum 0 (->722)!)
mail.58928 > mail.4369: Flags [F.], cksum 0xf148 (correct), seq 19, ack
2, win 8960, options [nop,nop,TS val 221277388 ecr 1967491061], length 0
19:34:01.445400 IP (tos 0x0, ttl 64, id 42851, offset 0, flags [DF], proto
TCP (6), length 52, bad cksum 0 (->721)!)
mail.4369 > mail.58928: Flags [.], cksum 0xf149 (correct), seq 2, ack
20, win 8959, options [nop,nop,TS val 1967491061 ecr 221277388], length 0
^C
8 packets captured
8 packets received by filter


0 packets dropped by kernel

So, I still guess, there might be a problem as Erlang somehow insists on
using localhost solely while this isn't a good thing for FreeBSD Jails as
Jails just have no fully functionable localhost (127.0.0.1 and locahost
exist and answer for pings, yes, but there are limitations nonetheless).

If there was a way to make Erlang use any configurable IP instead of
localhost, the issue almost probably was resolved.

I therefor tried to vary the content of the inetrc file but it seems, that's
not enough to really point Erlang to the true IP address.

As you asked for mail.kepos.org and 64.120.5.168:
These are the Jails hostname and single IP address as also properly used by
Postfix, Dovecot and MySQl in the same jail.

I meanwile got a hint from the freebsd-questions Mailinglist: The person
posting there mentioned, they would use Erlang 13B to run ejabberd 2.1.5 in
reeBSD Jails without any issue as they were faced the same issue earlier as
me now (with erlang 14B). Also this would match my earlier experience with
an older Erlang half a year ago: No problems that time (but cannot remember
which Erlang version I uses).

Even I of course cannot be sure, the recent changes in Erlang 14B you
mentioned might be the cause of all this. But as I have no clue about Erlang
I only want to mention this as one possibility.

What do you think ?

kind regards
Tom

Michael Santos

unread,
Nov 18, 2010, 6:03:55 PM11/18/10
to t...@diogunix.com, erlang-q...@erlang.org
On Thu, Nov 18, 2010 at 08:55:40PM +0100, t...@diogunix.com wrote:

> Ok, to be precisely I repeated the test:
>
> I first started tcpdum and let it run.
> I then started epmd -d -d -d in a second shell.
> I finally started erl -name foo in a third shell.

Awesome, this is exactly what I needed to see! Thanks for being so
patient with this.

> epdm -d -d -d now spit out something more informative:

> ***** 00000000 00 10 78 7c e5 4d 00 00 05 00 05 00 03 66 6f 6f

> |..x|.M.......foo|
> ***** 00000010 00 00 |..|
> epmd: Thu Nov 18 19:26:44 2010: time in seconds: 1290108404
> epmd: Thu Nov 18 19:26:44 2010: ** got ALIVE2_REQ
> epmd: Thu Nov 18 19:26:44 2010: ALIVE2_REQ from non local address
> epmd: Thu Nov 18 19:26:44 2010: closing connection on file descriptor 4

So it's confirmed the problem was that the source address was not the
one epmd expects.

> The inetrc file I was using was:
>
> {lookup,[file, dns]}.
> {host,{64,120,5,168}, ["mail.kepos.org"]}.
> {file, resolv, "/etc/resolv.conf"}.

inetrc is used for DNS resolution. It won't affect this particular
issue.

> tcpdump: WARNING: lo0: no IPv4 address assigned
> tcpdump: listening on lo0, link-type NULL (BSD loopback), capture size 96
> bytes
> 19:34:01.445154 IP (tos 0x0, ttl 64, id 42836, offset 0, flags [DF], proto
> TCP (6), length 60, bad cksum 0 (->728)!)
> mail.58928 > mail.4369: Flags [S], cksum 0x1c18 (correct), seq
> 1795656002, win 65535, options [mss 16344,nop,wscale 3,sackOK,TS val
> 221277388 ecr 0], length 0

> As you asked for mail.kepos.org and 64.120.5.168:

> So, I still guess, there might be a problem as Erlang somehow insists on

> using localhost solely while this isn't a good thing for FreeBSD Jails as
> Jails just have no fully functionable localhost (127.0.0.1 and locahost
> exist and answer for pings, yes, but there are limitations nonetheless).

Thanks! I've never used FreeBSD jails, so I was confused about how they
work. Erlang nodes are hard coded to connect to an epmd port on 127.0.0.1.
The jail apparently re-writes connections to localhost with the IP address
of the interface the jail is bound to. This behaviour is really sort of
nasty ;)

> If there was a way to make Erlang use any configurable IP instead of
> localhost, the issue almost probably was resolved.

I've attached a patch that disables the check.

I'll put together a better patch later. I can see a few ways of doing
this:

1. have a configurable source IP address as you suggested

2. checking the source IP is the same as the destination IP

3. checking the connection came over the loopback interface (probably no
portable way to do this)

4. have an option to disable the check (the old behaviour)

Aside from jails, I'm not sure anyone else would be affected by this. So
maybe option 4 is the way to go.


diff --git a/erts/epmd/src/epmd_srv.c b/erts/epmd/src/epmd_srv.c
index ef471a4..e2cc2dc 100644
--- a/erts/epmd/src/epmd_srv.c
+++ b/erts/epmd/src/epmd_srv.c
@@ -766,6 +766,9 @@ static int conn_open(EpmdVars *g,int fd)
dbg_tty_printf(g,2,(s->local_peer) ? "Local peer connected" :
"Non-local peer connected");

+ /* XXX allow local messages from all clients */
+ s->local_peer = EPMD_TRUE;
+
s->want = 0; /* Currently unknown */
s->got = 0;
s->mod_time = current_time(g); /* Note activity */

t...@diogunix.com

unread,
Nov 18, 2010, 6:59:02 PM11/18/10
to erlang-q...@erlang.org
Hello Michael,

> I've attached a patch that disables the check.

You're an angel, many thanks ! :-)


> I'll put together a better patch later. I can see a few ways of doing
> this:
>
> 1. have a configurable source IP address as you suggested
>
> 2. checking the source IP is the same as the destination IP
>
> 3. checking the connection came over the loopback interface (probably no
> portable way to do this)
>
> 4. have an option to disable the check (the old behaviour)
>
> Aside from jails, I'm not sure anyone else would be affected by this. So
> maybe option 4 is the way to go.

Regarding FreeBSD's Jail virtualizatioin:

Well, this is a grown sort of software virtualization which first appeared
several years ago (probably long before the current Linux virtualization
technologies popped up). Also, there's steady work to improve it's
capabilities.

However it's a pretty solid thing and worth supporting it as there are huge
amounts of people and companies (such as Yahoo) outside there using it to
propperly isolate different services running on a host. And for countless
further purposes. May be you also just want to honor the good and old and
true and "real" Unix: BSD ;-)

I agree that your option #4 was good on the short run. As I don't know
enough about other virtualization technologies outside the BSD world I just
could guess that your option #1 or #2 might be best on the longer run. Also,
I don't know for what purpose the Erlang community built in the check. There
might have been good reasons to do so of course.

Ok, I'll try to apply your patch now but also will contact the Port
maintainer from the FreeBSD universe to keep him posted.
Will CC you on this.

Many thanks again !
kind regards
Tom

t...@diogunix.com

unread,
Nov 18, 2010, 8:57:59 PM11/18/10
to erlang-q...@erlang.org, Michael Santos
Hello Michael,

your patch worked like a charm :-)

Many thanks again !

Tom

Reply all
Reply to author
Forward
0 new messages