janus crashing? autorestart using systemd. whats the best weay to debug???

1,283 views
Skip to first unread message

Kaplan

unread,
Sep 27, 2016, 5:16:17 PM9/27/16
to meetecho-janus
Hi,
what is the best way to troubleshoot a janus crash ? I am on latest master. Running on debian jessie.
I am using the built in https transport.
I have the log level set at 6. and janus being controlled by systemd on debian. So if the process dies, it auto restarts.
here is my systemd service:

[Unit]

Description=Janus WebRTC gateway

After=network.target


[Service]

Type=simple

ExecStart=/opt/janus/bin/janus -o

Restart=on-abnormal

Nice=-10

LimitCORE=infinity

LimitNOFILE=100000

LimitNPROC=60000

LimitSTACK=250000

LimitRTPRIO=infinity

LimitRTTIME=7000000

IOSchedulingClass=realtime

IOSchedulingPriority=2

CPUSchedulingPolicy=rr

CPUSchedulingPriority=89


[Install]

WantedBy=multi-user.target


Here is the log at the point of the crash:

Sep 27 15:12:06 m5 janus[27319]: [Tue Sep 27 15:12:06 2016] We have a message to serve...

Sep 27 15:12:06 m5 janus[27319]: {

Sep 27 15:12:06 m5 janus[27319]: "janus": "event",

Sep 27 15:12:06 m5 janus[27319]: "session_id": 2679895579916547,

Sep 27 15:12:06 m5 janus[27319]: "sender": 1362825805019229,

Sep 27 15:12:06 m5 janus[27319]: "plugindata": {

Sep 27 15:12:06 m5 janus[27319]: "plugin": "janus.plugin.audiobridge",

Sep 27 15:12:06 m5 janus[27319]: "data": {

Sep 27 15:12:06 m5 janus[27319]: "audiobridge": "joined",

Sep 27 15:12:06 m5 janus[27319]: "room": 1003,

Sep 27 15:12:06 m5 janus[27319]: "participants": [

Sep 27 15:12:06 m5 janus[27319]: {

Sep 27 15:12:06 m5 janus[27319]: "id": 2285189696780449,

Sep 27 15:12:06 m5 janus[27319]: "display": "b84950e8-1c65-4ff2-bd44-77a693073e0e",

Sep 27 15:12:06 m5 janus[27319]: "muted": true

Sep 27 15:12:06 m5 janus[27319]: }

Sep 27 15:12:06 m5 janus[27319]: ]

Sep 27 15:12:06 m5 janus[27319]: }

Sep 27 15:12:06 m5 janus[27319]: }

Sep 27 15:12:22 m5 systemd[1]: janus.service: main process exited, code=killed, status=11/SEGV

Sep 27 15:12:22 m5 systemd[1]: Unit janus.service entered failed state.

Sep 27 15:12:23 m5 janus[2012]: [Tue Sep 27 15:12:23 2016] Got a Janus API request from janus.transport.http (0x7f31a8000d80)

Sep 27 15:12:23 m5 janus[2012]: [Tue Sep 27 15:12:23 2016] [ERR] [transports/janus_http.c:janus_http_handler:1294] Couldn't find any session 7434939135210000...

Sep 27 15:12:23 m5 janus[2012]: [Tue Sep 27 15:12:23 2016] Transport task pool, serving request

Sep 27 15:12:23 m5 janus[2012]: [Tue Sep 27 15:12:23 2016] [ERR] [janus.c:janus_process_incoming_request:695] Couldn't find any session 7434939135210000...

Sep 27 15:12:23 m5 janus[2012]: [Tue Sep 27 15:12:23 2016] [DJkzTEkJgm4] Returning Janus API error 458 (No such session 7434939135210000)

Sep 27 15:12:23 m5 janus[2012]: [Tue Sep 27 15:12:23 2016] Got a Janus API request from janus.transport.http (0x7f319c000da0)

Sep 27 15:12:23 m5 janus[2012]: [Tue Sep 27 15:12:23 2016] [ERR] [transports/janus_http.c:janus_http_handler:1294] Couldn't find any session 7274935415163635...

Sep 27 15:12:23 m5 janus[2012]: [Tue Sep 27 15:12:23 2016] Transport task pool, serving request

Sep 27 15:12:23 m5 janus[2012]: [Tue Sep 27 15:12:23 2016] [ERR] [janus.c:janus_process_incoming_request:695] Couldn't find any session 7274935415163635...

Sep 27 15:12:23 m5 janus[2012]: [Tue Sep 27 15:12:23 2016] [gXk8InKWmmC] Returning Janus API error 458 (No such session 7274935415163635)

Sep 27 15:12:23 m5 janus[2012]: [Tue Sep 27 15:12:23 2016] Got a Janus API request from janus.transport.http (0x7f31a8004950)

Sep 27 15:12:23 m5 janus[2012]: [Tue Sep 27 15:12:23 2016] [ERR] [transports/janus_http.c:janus_http_handler:1294] Couldn't find any session 758740164516021...

Sep 27 15:12:23 m5 janus[2012]: [Tue Sep 27 15:12:23 2016] Transport task pool, serving request

Sep 27 15:12:23 m5 janus[2012]: [Tue Sep 27 15:12:23 2016] [ERR] [janus.c:janus_process_incoming_request:695] Couldn't find any session 758740164516021...

Sep 27 15:12:23 m5 janus[2012]: [Tue Sep 27 15:12:23 2016] [hqHddkiGRq6] Returning Janus API error 458 (No such session 758740164516021)

Sep 27 15:12:23 m5 janus[2012]: [Tue Sep 27 15:12:23 2016] Got a Janus API request from janus.transport.http (0x7f319c000da0)



The server seemed to have crashed a few times for me today, under a load of 60 participants.  Systemd restarted it.


I am using https (built in) with long polling  vs fronting janus  with nginx, so not sure if that is the culprit.  I've read in the groups a while ago that the built in https was buggy.  

Kaplan

unread,
Sep 27, 2016, 8:34:11 PM9/27/16
to meetecho-janus
Looking at the group posts, I updated to a manual install ofl libsrt v1.5.4  vs the default debian jessie one...
fingers crossed

Lorenzo Miniero

unread,
Sep 28, 2016, 2:00:29 AM9/28/16
to meetecho-janus
Please DON'T paste huge bunches of texts in posts or issues, as explained in the guidelines. Better to rely on services like gist or pastebin, or otherwise I find it very hard to read meaningful part of messages.
As to debugging in case libsrtp is not the cause, check https://janus.conf.meetecho.com/docs/debug

L.

Kaplan

unread,
Sep 28, 2016, 6:58:02 AM9/28/16
to meetecho-janus
Thanks for the link Lorenzo!

Kaplan

unread,
Sep 28, 2016, 3:51:09 PM9/28/16
to meetecho-janus

Trying to debug my crash. I am on an x64 debian jessie machine. I installed libasan1.


I also removed the stock libstrp that came with the OS and installed and configured the v1.5.4 per the instructions on github (I used the --libdir=/usr/lib64).

Then make clean && make && make install on the v1.5.4 libsrt folder.

Then on janus, I did 

make clean. run the following:

CFLAGS="-fsanitize=address -fno-omit-frame-pointer" LDFLAGS="-lasan" ./configure --prefix=/opt/janus --disable-data-channels -libdir=/usr/lib64


I still get the crash after I restarted janus.  Inspecting with ldd showes me /usr/lib/libsrtp.sp.0 vs /usr/lib64. Am I doing something wrong?

ldd shows :

root@m5:/usr/local/src/janus-gateway# ldd janus | grep asan

        libasan.so.1 => /usr/lib/x86_64-linux-gnu/libasan.so.1 (0x00007f37da7b5000)


root@m5:/usr/local/src/janus-gateway# ldd janus | grep srtp

        libsrtp.so.0 => /usr/lib/libsrtp.so.0 (0x00007fb8e232a000)


Shouldn't is say /usr/lib64 ???


On Tuesday, September 27, 2016 at 5:16:17 PM UTC-4, Kaplan wrote:

Kaplan

unread,
Sep 28, 2016, 4:28:12 PM9/28/16
to meetecho-janus
I also compared this particular server to another identical server where, knock on wood, the crash does not happen. both have the same janus master head version.
However the one were it crashes wash configured like this:

./configure --prefix=/opt/janus --disable-data-channels -libdir=/usr/lib64


the one where so far it does not crash:

 ./configure --prefix=/opt/janus --disable-websockets --disable-data-channels --disable-rabbitmq --no-create --no-recursion


so I wonder if having websockets compiled in or rabbitmq has anything to do...  I will try to use asan asap ;)



Kaplan

unread,
Sep 28, 2016, 7:00:30 PM9/28/16
to meetecho-janus
With AddressSanitizer loade, Janus won't start:
I am provably doing something wrong.


Kaplan

unread,
Sep 28, 2016, 7:24:49 PM9/28/16
to meetecho-janus
seems like I have the wrong version for libasan :( not 64bit


On Tuesday, September 27, 2016 at 5:16:17 PM UTC-4, Kaplan wrote:

Chad Furman

unread,
Sep 29, 2016, 1:27:27 AM9/29/16
to meetecho-janus
I see no mention of gdb anywhere in this post thread.  What you want to do is:

1) install gdb
2) run the command "$ gdb /opt/janus/bin/janus core" where $ represents the terminal prompt, /opt/janus/bin/janus is the path to the executable that crashed (must be the same executable, new versions will cause addresses to be different), and "core" is the coredump produced by the crashed Janus instance.  Core dumps are usually found in the folder where Janus was invoked from.  That is, if you're in "/" when you run "/opt/janus/bin/janus" and Janus crashes, then the core is "/core"
3) with gdb running the program and core dump, type "backtrace" or "bt" to get the line that caused the crash
4) use other gdb commands to inspect the memory around that line, like "frame" to jump to one of the frames of the backtrace, and "info locals" to list the local variables at that address.

If you see things like "optimized out" then you'll either need to read through the ASM or you'll need to recompile with no-optimize attributes on the function in question.  For example:

__attribute__((optimize("O0"))) void my_func()

That's O as in Omega and 0 as in zero.  O0


On Tuesday, September 27, 2016 at 5:16:17 PM UTC-4, Kaplan wrote:

Lorenzo Miniero

unread,
Sep 29, 2016, 4:43:20 AM9/29/16
to meetecho-janus
Il giorno giovedì 29 settembre 2016 07:27:27 UTC+2, Chad Furman ha scritto:
I see no mention of gdb anywhere in this post thread.  What you want to do is:

1) install gdb
2) run the command "$ gdb /opt/janus/bin/janus core" where $ represents the terminal prompt, /opt/janus/bin/janus is the path to the executable that crashed (must be the same executable, new versions will cause addresses to be different), and "core" is the coredump produced by the crashed Janus instance.  Core dumps are usually found in the folder where Janus was invoked from.  That is, if you're in "/" when you run "/opt/janus/bin/janus" and Janus crashes, then the core is "/core"
3) with gdb running the program and core dump, type "backtrace" or "bt" to get the line that caused the crash
4) use other gdb commands to inspect the memory around that line, like "frame" to jump to one of the frames of the backtrace, and "info locals" to list the local variables at that address.

If you see things like "optimized out" then you'll either need to read through the ASM or you'll need to recompile with no-optimize attributes on the function in question.  For example:

__attribute__((optimize("O0"))) void my_func()

That's O as in Omega and 0 as in zero.  O0



gdb is mentioned in the debug page of our documentation, together with libasan, as they're simply different ways of investigating an issue. libasan helps more when you want to track leaks or heap issues, as you can consider it a lightweight valgrind. I didn't know about the per-function no-optimize attribute, thanks for pointing that out.

L.

Kaplan

unread,
Sep 29, 2016, 9:12:28 AM9/29/16
to meetecho-janus
Thanks Chad & Lorenzo! With your instructions I was able to get my missing core, I could find it anywhere, but now I see it in /core/ I'll wait for the next crash, as I am sure I recompiled janus with different settings since this core was generated, so there will most like be a mismatch in the address.

Igor Khomenko

unread,
Dec 22, 2016, 12:00:32 PM12/22/16
to meetecho-janus

I have almost the same scenario - Janus sometimes just crashes - nothing interesting in logs, just regular Janus logs

I'm trying to use "$ gdb /opt/janus_dev/bin/janus core" but I do not see the 'core' file in a directory when I run Janus.

I do it this way:  "sudo /opt/janus_dev/bin/janus -b -L /var/log/janus.log"   

Should I enable anything to see that 'core' file ?

Lorenzo Miniero

unread,
Dec 22, 2016, 12:39:42 PM12/22/16
to meetecho-janus
Core files depend on the OS. In some cases playing with ulimit is enough, in others (e.g., ubuntu) you have to edit stuff like /proc/sys/kernel/core_pattern as well. No idea on what may be required in other cases. Anyway, out of scope to Janus itself.

L.

Igor Khomenko

unread,
Dec 23, 2016, 5:56:05 AM12/23/16
to meetecho-janus
Lorenzo, by 'playing with ulimit' you mean to fix these crashes? or to find these core dumps files? 

Horsetopus

unread,
Dec 23, 2016, 6:08:47 AM12/23/16
to meetecho-janus
ulimit is what sets the limits of different resources inUNIX systems.
I had the issue with Janus of crashes due to the number of files descriptors being set to low.

just try this in a terminal:
ulimit -n 64000
And then run Janus. Hopefully this will have solved your problem.

I don't know if it is because I run on a VM, but this is reset to the default at reboot.
So I launch Janus from a script, and always run ulimit before.

Lorenzo Miniero

unread,
Dec 23, 2016, 6:18:27 AM12/23/16
to meetecho-janus
To generate core dumps, e.g.:

Anyway, as I said, on some systems that's not enough.

L.

Lorenzo Miniero

unread,
Dec 23, 2016, 6:19:43 AM12/23/16
to meetecho-janus
Yes, if the "too many open files" is the crash cause, then it's addressed in the FAQ, #25, and ulimit is what we also mention:

L.

Igor Khomenko

unread,
Dec 23, 2016, 6:53:32 AM12/23/16
to meetecho-janus
Ok, on ubuntu 16.04 I get crashes here:

/var/crash/_opt_janus_bin_janus.0.crash

I'm trying to understand how to read this file, because gdb /path/to/bin/janus /path/to/coredump does not work

Ancor Gonzalez Sosa

unread,
Dec 24, 2016, 3:44:39 AM12/24/16
to meetecho-janus
El martes, 27 de septiembre de 2016, 22:16:17 (UTC+1), Kaplan escribió:
Hi,
what is the best way to troubleshoot a janus crash ? I am on latest master. Running on debian jessie.
I am using the built in https transport.
I have the log level set at 6. and janus being controlled by systemd on debian. So if the process dies, it auto restarts.
here is my systemd service:

[Unit]

Description=Janus WebRTC gateway

After=network.target


[Service]

Type=simple

ExecStart=/opt/janus/bin/janus -o

Restart=on-abnormal

Nice=-10

LimitCORE=infinity

LimitNOFILE=100000

LimitNPROC=60000

LimitSTACK=250000

LimitRTPRIO=infinity

LimitRTTIME=7000000

IOSchedulingClass=realtime

IOSchedulingPriority=2

CPUSchedulingPolicy=rr

CPUSchedulingPriority=89


I see you have fine-tuned this a lot. Still, I miss this line to prevent systemd from killing your janus if it creates too many threads

TasksMax=infinity

That feature was introduced in systemd 228 and works in system with Linux kernel >= 4.3

I plan to open a pull request to include that line here https://github.com/meetecho/janus-gateway/blob/29c314fbc6d411968effee3e63ead6eb77ce4717/mainpage.dox#L2147
 
Reply all
Reply to author
Forward
Message has been deleted
0 new messages