janus gateway crash for unknown reason

200 views
Skip to first unread message

Firas Abd Alrahman

unread,
Jul 13, 2017, 10:05:30 AM7/13/17
to meetecho-janus
Hi;
When we use audiobridge for about 10+ users for maybe 1-2 hours we face a crash.
I checked the log but did not find a clear reason.
I am using ubuntu 64
libwebsocket already built with flag DLWS_MAX_SMP=1

git clone https://github.com/warmcat/libwebsockets.git
cd libwebsockets
cmake -DLWS_MAX_SMP=1 .
make
sudo make install
ldconfig

Janus build script

sudo apt-get install libmicrohttpd-dev libjansson-dev libnice-dev libssl-dev libsrtp-dev libsofia-sip-ua-dev libglib2.0-dev libopus-dev libogg-dev libini-config-dev libcollection-dev pkg-config gengetopt automake libtool doxygen graphviz git cmake
sudo apt-get install libavformat-dev
mkdir -p ~/build
cd ~/build
git clone git://github.com/meetecho/janus-gateway.git
cd janus-gateway
sh autogen.sh
./configure --disable-data-channels --disable-rabbitmq --disable-docs --prefix=/opt/janus LDFLAGS="-L/usr/local/lib -Wl,-rpath=/usr/local/lib" CFLAGS="-I/usr/local/include"
make && sudo make install
sudo make configs
 

I use latest janus version (2 days go) and latest adapter.js

Suddenly all wss connections give timeout.. until I stop / restart janus.
janus client provide "invalid handle" for all users, then connection lost.

Here is my janus Log uploaded (taken after crashing directly): https://drive.google.com/file/d/0B6p22eiQNdqtbVNTemlibDhXbkk/view?usp=sharing

Thank you!


Lorenzo Miniero

unread,
Jul 13, 2017, 1:08:30 PM7/13/17
to meetecho-janus
Logs won't help debug a crash, we need a gdb stacktrace or something similar.

L.

Firas Abd Alrahman

unread,
Jul 13, 2017, 1:21:54 PM7/13/17
to Lorenzo Miniero, meetecho-janus
Welcome Lorenzo,
'crashed' means janus process still open but socket not responding (timeout).

Lorenzo Miniero

unread,
Jul 13, 2017, 1:26:29 PM7/13/17
to meetecho-janus, lmin...@gmail.com
"Crash" means process crash, in our book: if the process is still alive it hasn't crashed, a timeout or a deadlock is an entirely different thing.
Try enabling the locking debug (either in janus.cfg or via admin.html) to enrich the logs, and then use the simple JS tool I wrote to try and see which lock is still held: https://github.com/meetecho/janus-gateway/issues/732#issuecomment-297767502
The discussion I linked will contain info on how to use it.

L. 

Firas Abd Alrahman

unread,
Jul 13, 2017, 1:47:06 PM7/13/17
to Lorenzo Miniero, meetecho-janus
Thank you for information, and sorry for bad english.
Yes it is deadlock.
I added lock_debug=yes in janus.cfg and restarted, is this correct ?
or should I modify the variable in janus.c and compile ?

Should I modify socket config ws_logging to 7 ?


Firas Abd Alrahman

unread,
Jul 13, 2017, 5:28:13 PM7/13/17
to Lorenzo Miniero, meetecho-janus
Hi;
now, It is 'crash' which mean processes terminated ( I was confused ).

I found two crash reports (_opt_janus_bin_janus.0.crash, _opt_janus_bin_janus.33.crash) in /var/crash 

and 


--
Eng. Firas Abd Alrahman
Khartoum, Sudan, Alriad, 60 St.
Mobile : +249126712777
Phone  : +249123334569
              

Firas Abd Alrahman

unread,
Jul 14, 2017, 3:59:13 PM7/14/17
to Lorenzo Miniero, meetecho-janus
Hi,
Did I send the right files to fix these kind of problems ?
I am testing with <10 users, after some hours Janus crash, I think this started to happen after the last update, it was more stable before that.

Lorenzo Miniero

unread,
Jul 14, 2017, 4:01:15 PM7/14/17
to Firas Abd Alrahman, meetecho-janus
Sorry, I'm abroad for a conference and won't be able to do any debugging until I come back. Hopefully someone else will help you looking into this in the meanwhile.

L.

Firas Abd Alrahman

unread,
Jul 14, 2017, 4:05:13 PM7/14/17
to Lorenzo Miniero, meetecho-janus
Thank you, good luck!

Firas Abd Alrahman

unread,
Jul 15, 2017, 7:47:04 PM7/15/17
to Lorenzo Miniero, meetecho-janus
Updates:

New crash files from /var/crash 

uname -a
Linux vps 4.4.0-83-generic #106-Ubuntu SMP Mon Jun 26 17:54:43 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

janus pull / compiled at 15-jun 

It was more stable by far before update.
 I remembered that I forgot to update janus.js I just updated and waiting for results.



Lorenzo Miniero

unread,
Jul 16, 2017, 3:00:32 AM7/16/17
to Firas Abd Alrahman, meetecho-janus
If you think an update caused this, please try previous revisions too to find the commit that in your opinion is to blame.

L.

Lorenzo Miniero

unread,
Jul 16, 2017, 4:24:10 AM7/16/17
to meetecho-janus, doo...@gmail.com
PS: I can't check the content of the file you shared right now, but next time please make sure you share gdb backtraces, and not the core dumps you shared last time, as we can't use those.

L.

Firas Abd Alrahman

unread,
Jul 17, 2017, 4:43:22 PM7/17/17
to meetecho-janus, doo...@gmail.com
Here is a gdb backtrace


Steps: 

gdb /opt/janus/bin/janus 2>&1 | tee ~/gdb-janus.txt
  1. handle SIG33 pass nostop noprint
    set pagination 0
    run
    
after crash:

backtrace full
info registers
x/16i $pc
thread apply all backtrace

Firas Abd Alrahman

unread,
Jul 17, 2017, 4:46:15 PM7/17/17
to meetecho-janus, doo...@gmail.com
gdb backtrace only without janus log

Lorenzo Miniero

unread,
Jul 18, 2017, 3:14:22 AM7/18/17
to meetecho-janus, doo...@gmail.com
Have you tried if the same happens when you use plain websockets, and proxy them via nginx/httpd/haproxy to do secure websockets with the client? (note: still abroad until the 22nd, so can't do any digging until then).

L.

Firas Abd Alrahman

unread,
Jul 18, 2017, 5:41:40 AM7/18/17
to meetecho-janus
Thank you I appreciate your reply, I will setup proxy solution until problem fixed.
I started to believe problem started after pulling libwebsocket 2-3 weeks ago.
I will test the wss solution when  you get free or someone else free to support.

Firas Abd Alrahman

unread,
Jul 18, 2017, 2:14:32 PM7/18/17
to meetecho-janus
ws working fine behind apache proxy, is this solution stable for production? 
wss => apache => janus ws ... ? 

Mirko Brankovic

unread,
Jul 18, 2017, 2:49:45 PM7/18/17
to meetecho-janus
Maybe you can try to use libwebsocket 1.7 or 2.0 , I didn't have any issues with those versions

On Jul 18, 2017 20:14, "Firas Abd Alrahman" <doo...@gmail.com> wrote:
ws working fine behind apache proxy, is this solution stable for production? 
wss => apache => janus ws ... ? 

--
You received this message because you are subscribed to the Google Groups "meetecho-janus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to meetecho-janus+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Firas Abd Alrahman

unread,
Jul 18, 2017, 4:56:59 PM7/18/17
to meetecho-janus
Thank you Mirko,
Even after switching to ws over apache proxy, janus crashed after some hours!

I went back now to libwebsocket 1.7.9 per your advice, and running under gdb session.


Firas Abd Alrahman

unread,
Jul 19, 2017, 7:06:14 AM7/19/17
to meetecho-janus
Crashed again:
  • Latest janus.js, adapter.js, janus compile (1 day ago)
  • libwebsocket 1.7.9 compiled with -DLWS_MAX_SMP=1
  • janus ws over apache wss proxy
  • os: Linux vps 4.4.0-83-generic #106-Ubuntu SMP Mon Jun 26 17:54:43 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux


#0  0x00007ffff7672393 in g_list_last () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
No symbol table info available.
#1  0x00007ffff76723df in g_list_append () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
No symbol table info available.
#2  0x000000000043bfba in janus_ice_send_thread (data=0x7fffbc00b400) at ice.c:3817
 

Mirko Brankovic

unread,
Jul 19, 2017, 7:52:16 AM7/19/17
to meetecho-janus
so the problem appears in boldded line, re-transmission of packet:
3808                                                 if(max_nack_queue > 0) {
3809                                                         /* Save the packet for re-transmissions that may be needed later */
3810                                                         janus_rtp_packet *p = (janus_rtp_packet *)g_malloc0(sizeof(janus_rtp_packet));
3811                                                         p->data = (char *)g_malloc0(protected);
3812                                                         memcpy(p->data, sbuf, protected);
3813                                                         p->length = protected;
3814                                                         p->created = janus_get_monotonic_time();
3815                                                         p->last_retransmit = 0;
3816                                                         janus_mutex_lock(&component->mutex);
3817                                                         component->retransmit_buffer = g_list_append(component->retransmit_buffer, p);
3818                                                         janus_mutex_unlock(&component->mutex);
3819                                                 }


--
You received this message because you are subscribed to the Google Groups "meetecho-janus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to meetecho-janus+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Regards,
Mirko

Firas Abd Alrahman

unread,
Jul 19, 2017, 8:06:37 AM7/19/17
to meetecho-janus
Yes Mirko, but can not go deeper in c++!
at least it is not libwebsocket now...
Waiting for something good to happen :(


Mirko Brankovic

unread,
Jul 19, 2017, 8:32:11 AM7/19/17
to meetecho-janus
But from the trace I dee that your ice.c is longer: 

  1. Thread 1 (Thread 0x7ffff7fda880 (LWP 7645)):
  2. #0  0x00007ffff5e6c30d in nanosleep () at ../sysdeps/unix/syscall-template.S:84
  3. #1  0x00007ffff5e9dd54 in usleep (useconds=<optimized out>) at ../sysdeps/posix/usleep.c:32
  4. #2  0x000000000045daf2 in main (argc=1, argv=0x7fffffffe688) at janus.c:4222
  5. (gdb) Quit
 than one from the master now (4042 lines), or am I reading it wrongly....

Are you sure you are running the version from master.
Last thing that Lorenzo implemented was timer in Ice loop to give 'slow' clients more time to start DTLS handshake (commit: 948f3cbea23b3bd82729d157c1329a3bb2934253)

So maybe you can try with janus 2.3 version and see if you can get the same problem ?

Thanks,
mirko

Mirko Brankovic

unread,
Jul 19, 2017, 8:35:07 AM7/19/17
to meetecho-janus
oh sorry this is janus.c not ice.c :D
my mistake 
--
Regards,
Mirko

Firas Abd Alrahman

unread,
Aug 1, 2017, 9:21:13 AM8/1/17
to meetecho-janus
This finally solved by installing new Ubuntu with update disabled.

I had a concern that crash started to happens after updating ubuntu packages.
Now after many days with testing with users randomally 10 - 20 concurrent no crashing at all.

Disabling Linux update will lead to security issues, but I have no options currently. I think lib conflict happening somehow.

Or maybe using latest janus announced master 0.2.5 solved the issue? I do not know ... 

Lorenzo Miniero

unread,
Aug 1, 2017, 9:28:05 AM8/1/17
to meetecho-janus
You can try updating one dependency at a time, to see if you can narrow down what's causing it to break for you.

L.
Reply all
Reply to author
Forward
0 new messages