Out of memory: Kill process janus

602 views
Skip to first unread message

konstant...@gmail.com

unread,
Nov 26, 2017, 8:30:06 AM11/26/17
to meetecho-janus
Hi, 

Centos kill janus because of a memory lack.
I wonder why it's happening as a server has 3.75GB of RAM,
only one plugin enabled,
one transport is used (rest),
almost there are no activities...

state of a server when Janus is running:
CPU: 4-6%
Mem: 592M/3.5G

and other errors

[8729642.563577] traps: iceloop 8856753[12291] general protection ip:4396b6 sp:7f77227d3230 error:0 in janus[400000+71000]
[8889522.642217] audiobridge han[12958]: segfault at 0 ip           (null) sp 00007f7212ffc428 error 14 in janus[400000+71000]

Mirko Brankovic

unread,
Nov 26, 2017, 10:09:03 AM11/26/17
to meetecho-janus
I have also encountered ice loop mem leak in audio bridge

--
You received this message because you are subscribed to the Google Groups "meetecho-janus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to meetecho-janus+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Lorenzo Miniero

unread,
Nov 27, 2017, 5:24:55 AM11/27/17
to meetecho-janus
Not aware of any memory leak, personally. If you guys can see if Valgrind or AddressSanitizer spot something for you, I'll be glad to have a look.

L.

Mirko Brankovic

unread,
Nov 27, 2017, 8:11:28 AM11/27/17
to meetecho-janus
It is still on my TODO list, since i'm not sure in which scenario is it appearing.
my suspect is when users are leaving and re-entering many times..... most likely the race condition on hangup, as explained in:

So i guess this will be the cure:

--
Regards,
Mirko

konstant...@gmail.com

unread,
Nov 27, 2017, 10:53:26 AM11/27/17
to meetecho-janus
my suspect is when users are leaving and re-entering many times

I'm also thinking in this direction, 
after a few re-entering to a same room in a while (in about an hour) kill process happens.



неділя, 26 листопада 2017 р. 15:30:06 UTC+2 користувач konstant...@gmail.com написав:

Kaplan

unread,
Mar 5, 2018, 7:57:50 AM3/5/18
to meetecho-janus
I am experiencing the same issue now! I need help to see what I can do.
the janus servers where I ran audio bridge are all running out of memory within hours, Janus memory keep growing and growing
On one server I am on Janus Master 0.3.0 8fdd7cd6301fa41dffcf44bcc487bb4baae6a941
I restart Janus at 3am and at 7am I was already out of memory , may be with a single person connecting/reconnecting...

any help appreciated...

Kaplan

unread,
Mar 5, 2018, 8:35:03 AM3/5/18
to meetecho-janus
I just disabled the firewall on this server (the one running audiobridge) and it seems to be stable now, so something with the ICE negotiation was not working well.
Since this server was on master 8fdd7cd6301fa41dffcf44bcc487bb4baae6a941from a few days ago I assume the fix was already merged : https://github.com/meetecho/janus-gateway/pull/1035

so something else must be happening... will investigate and post here.

Kaplan

unread,
Mar 5, 2018, 10:16:06 AM3/5/18
to meetecho-janus
testing latest master as of this second, v 0.3.1, memory still increasing on a the Janus server by the minute, nothing much going on on the server...


getting desperate over here, not sure what to try, any help appreciated..

Mirko Brankovic

unread,
Mar 5, 2018, 10:36:18 AM3/5/18
to meetecho-janus
I guess the only way would be to ran Janus within Valgrind or some other tool, and repeat the test and see the report of memory loss

--
You received this message because you are subscribed to the Google Groups "meetecho-janus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to meetecho-janus+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Regards,
Mirko

Kaplan

unread,
Mar 5, 2018, 10:36:44 AM3/5/18
to meetecho-janus
Here is a top showing the threads, seems like the "mixer" in the audio room is thatking up the memory:


anybody else seen this?

Lorenzo Miniero

unread,
Mar 5, 2018, 10:39:22 AM3/5/18
to meetecho-janus
No, that's the memory the whole of Janus is taking. You can see how all the thread pids associated with Janus have the same value. Mixer is on top simply because it's using more CPU.

L.

Mirko Brankovic

unread,
Mar 5, 2018, 10:40:46 AM3/5/18
to meetecho-janus
I had a crash because of the Audio bridge room when I was testing with 30-35 users, but haven't done any debugging since it wasn't on priority list :D

--
You received this message because you are subscribed to the Google Groups "meetecho-janus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to meetecho-janus+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Regards,
Mirko

Kaplan

unread,
Mar 5, 2018, 11:14:43 AM3/5/18
to meetecho-janus
this is extremely weird as it started happening across different servers, on different versions of janus (0.2.5, 0.3.0, and now on latest 0.3.1),
the only thing in common is that this servers have the audiobridge, then I rtp forward the audiobridge to other januses... has been working fine for over a year (the one on 0.2.5)
now it wont last more than a few hours febore the whole machine runs out of memory.

very weird. 
One thing I have been doing with my janus_audiobridge.c (that I have been doing for a year without issues) is change this from 6 to 24 and recompile:
#define DEFAULT_PREBUFFERING    24 // <--this is 6 on the code

going nut...
oh, this are all production servers :(

Lorenzo, any idea what I can try to help debug?

Kaplan

unread,
Mar 5, 2018, 11:16:03 AM3/5/18
to meetecho-janus
Oh, I see that you posted above regarding using valgrind and address sanitizer, sorry missed that...

Kaplan

unread,
Mar 5, 2018, 11:20:15 AM3/5/18
to meetecho-janus
may be I can try the ref counter branch ;) I thought 0.3.1 had it merged in...

Kaplan

unread,
Mar 6, 2018, 7:36:42 AM3/6/18
to meetecho-janus
Right now I have the Janus server where the memory footprint  of Janus keeps growing and growing , nobody connected to the Janus server (only a node controller user by means of the http transport)  and with with the audio  rtp_forwarded to multiple servers (just silence as nobody with media is connected), the only think janus is doing is the mixer for the room. I don't see anything on the logs (but I am on level 4), will try to turn them up and see. driving me crazy :)

Kaplan

unread,
Mar 6, 2018, 7:48:19 AM3/6/18
to meetecho-janus
will try to learn the dark arts of valgrind, never used it before...

Kaplan

unread,
Mar 7, 2018, 8:15:13 AM3/7/18
to meetecho-janus
UPDATE: Resolved??
Hopefully this will help others with the audiobridge, what I think was happening to me is that I have a nodeJS client that joins the audio room, just to get events, but its a "headless" usr, has no media. With that user connected I had situations where the Janus memory kept increasing without any real load on the server (no users), just rtp forwarding the output of the audio bridge, and the "headless" user connected.
I changed my code so that the nodeJS user does not join the room anymore, and so far do good. will post back later today to confirm. 


On Tuesday, March 6, 2018 at 7:36:42 AM UTC-5, Kaplan wrote:

Lorenzo Miniero

unread,
Mar 7, 2018, 8:31:05 AM3/7/18
to meetecho-janus
That's weird, a user that doesn't send anything should have no impact at all, as we never get any RTP packet from them to decode and mix. It may be a red herring: is the same happening when you create a room, RTP forward it, and no one's in?

L.

Kaplan

unread,
Mar 7, 2018, 9:19:47 AM3/7/18
to meetecho-janus
the node user joins the audio room, but AFAIk, does not create an offer on anything,  basically I am using the HTTP api to 
1) create the room if it does not exist 
2) join the room
3) rtp forward the audio to a bunch of other Januses...

I don't ever create an offer, or anything else..  Both machines are behind a firewall now (that was the only change I did that I can think of).

Last night, I changed my code to remove 2) above, so I just create the room from node, and RTP forward, the node user no longer joins the room.  So far the memory in Janus has not increased. Before , by now, it would be around 1.25GB...  Will keep an eye on it and report...
Same 4 audio rooms running, memory is tiny compared to yesterday..

Lorenzo Miniero

unread,
Mar 7, 2018, 11:18:13 AM3/7/18
to meetecho-janus
If it's not creating an offer I'm again not sure it's the culprit... that's why I was interested to understand if the cause was RTP forwarding with no one in, which would be the equivalent of your fake user with no connection being the only one in the room. I made a few tests myself and I definitely could not find any leak in the AudioBridge.

L.

Lorenzo Miniero

unread,
Mar 7, 2018, 11:19:41 AM3/7/18
to meetecho-janus
Unless maybe the AudioBridge sees the participant as if he were in, creates an RTP packet, tells the core to send it, and then the core dumps it because there's actually no WebRTC PeerConnection available. Maybe we have something there that leaks in that scenario? I'll have to check.

L. 

Kaplan

unread,
Mar 7, 2018, 11:30:15 AM3/7/18
to meetecho-janus
Thanks Lorenzo,
It could be a race condition that triggers some sort of loop. I recompile Janus with AddressSanitizer, and been running it on a test server, but I am unable to see anything.
What I can tell you is this: after I made that change, 2 servers that had the issue (nobody real in the audiobridge but the node user) and whose memory would keep growing, now exhibit perfect behavior.   I restart my Janus server at 3am, and sometimes, with the prior behavior before my change,  within a few hours the server would be already using 1.2GB of ram, and constantly growing..  Now its almost mid day, and no issues, I will keep investigating and let you know...

Lorenzo Miniero

unread,
Mar 7, 2018, 11:32:21 AM3/7/18
to meetecho-janus
As a side note, if you were using that fake user just as a way to be notified about joins/leaves on the server side, you may want to start using the Event Handlers instead. As part of the plugin-specific events, you'll get notifications from the AudioBridge plugin on everything that happens there.

L.

Bellesoft Consulting

unread,
Mar 7, 2018, 12:41:45 PM3/7/18
to Lorenzo Miniero, meetecho-janus
Yup, thanks
--
You received this message because you are subscribed to a topic in the Google Groups "meetecho-janus" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/meetecho-janus/b8Ks2-2Ozec/unsubscribe.
To unsubscribe from this group and all its topics, send an email to meetecho-janu...@googlegroups.com.

Lorenzo Miniero

unread,
Mar 7, 2018, 1:16:37 PM3/7/18
to meetecho-janus
I think what I thought about was it. From the mixer, we don't send a packet to participants right away: we only enqueue it in a queue belonging to them, and it's their thread that iterates on those packets, encodes them, and passes them to the core. Under certain conditions, it may happen that we start enqueueing packets, but the thread never gets to the point where it pops packets from the queue, and so it piles up. My guess is that your fake user was indeed triggering that condition, even though I don't have a clear idea how.

Any chances you can share what you were using to create the dummy user, so that we can try and replicate?

L.
To unsubscribe from this group and all its topics, send an email to meetecho-janus+unsubscribe@googlegroups.com.

Bellesoft Consulting

unread,
Mar 7, 2018, 4:07:43 PM3/7/18
to Lorenzo Miniero, meetecho-janus
Yes, I am traveling now, but will post the code when I get to my computer.

Thanks for looking into it! Yes it sound like you are on the right track. Looks like a queue. Growing and growing
To unsubscribe from this group and all its topics, send an email to meetecho-janu...@googlegroups.com.

Kaplan

unread,
Mar 7, 2018, 4:55:05 PM3/7/18
to meetecho-janus
Hi Lorenzo,
I emailed you the code...
Thanks for looking into it. I did what you suggested, and I no longer join the audio room, just to get events. Fingers crossed, so far not a single issue since my code change.

Lorenzo Miniero

unread,
Mar 8, 2018, 6:30:35 AM3/8/18
to meetecho-janus
This should be solved here:

As part of this fix, I also added a new property called "setup" to all the events related to participants joining and leaving. When setup=false, it means that this specific participant has joined, but doesn't have a valid PeerConnection (yet), and as such can't hear anything, nor speak of course. When a new event with setup=true arrives, then the PeerConnection was successfully established for that participant. It might be helpful for updating the UI of applications using the AudioBridge: in the demo, we just show a "unlinked" icon next to the name when there's no PC.

L.

Kaplan

unread,
Mar 8, 2018, 7:52:29 AM3/8/18
to meetecho-janus
Thanks Lorenzo for all the great work you do!
I will test this now and let you know. I guess the issue could also have been triggered by a real user who simply does not have a mic or denies the "Allow chrome to access your mic"  while joining the audio bridge? 

Selfish request: While you have the janus_audiobridge.c open,  is it hard to add a config option for the DEFAULT_PREBUFFERINGhttps://github.com/meetecho/janus-gateway/blob/fa6c5fdbebf95f3e762314fc910b0fc492611558/plugins/janus_audiobridge.c#L1006  its hardcoded to 6 packets, on all my janus servers, I keep changing this to 24, and it has proven to provide much better audio quality under slower or choppy networks.  no big deal ...

To unsubscribe from this group and all its topics, send an email to meetecho-janu...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "meetecho-janus" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/meetecho-janus/b8Ks2-2Ozec/unsubscribe.
To unsubscribe from this group and all its topics, send an email to meetecho-janu...@googlegroups.com.

Lorenzo Miniero

unread,
Mar 8, 2018, 9:32:07 AM3/8/18
to meetecho-janus
Adding to the TODO list but no idea on when it will be done honestly, just too busy these days.

L.

Kaplan

unread,
Mar 8, 2018, 9:53:12 AM3/8/18
to meetecho-janus
no prob, I got a SED script to change it from 6 to 24

One thing I noticed while testing this commit is that the "sampling_rate" in the create command is not working, or may be I am doing it it wrong:

Lorenzo Miniero

unread,
Mar 8, 2018, 9:57:11 AM3/8/18
to meetecho-janus
In the API it's "sampling", not "sampling_rate". If you think this is confusing, it is, because I had to look it up too :-)

L.

Kaplan

unread,
Mar 8, 2018, 10:11:07 AM3/8/18
to meetecho-janus
cool, so far so good, I am going to put this commit in prod now, but it I can't restart Janus till tonight, will test it tomorrow and through the weeking ;)
so far so good..

So would the issue with the no media from the node headless user also affected users who failed to negotiate the audio on the bridge? 

Lorenzo Miniero

unread,
Mar 8, 2018, 10:15:43 AM3/8/18
to meetecho-janus
Il giorno giovedì 8 marzo 2018 16:11:07 UTC+1, Kaplan ha scritto:
cool, so far so good, I am going to put this commit in prod now, but it I can't restart Janus till tonight, will test it tomorrow and through the weeking ;)
so far so good..

So would the issue with the no media from the node headless user also affected users who failed to negotiate the audio on the bridge? 


Yep,

L.

Kaplan

unread,
Nov 12, 2018, 9:35:14 AM11/12/18
to meetecho-janus
Hi Lorenzo,
Thanks again for all the great work you do.
I am revisiting this thread as even though you fixed the issue I reported in this commit long ago (https://github.com/meetecho/janus-gateway/commit/fa6c5fdbebf95f3e762314fc910b0fc492611558)  I never really made the nodeJS user join the audio room since then :)   Over the weekend I deployed changed to my nodeJS code and now the node server is joining the Janus audio room to receive events (this is a headless user with no media), not interested in user join/leave, but this time looking for "talking" / "stop talking" events.

I am writing here, because I am seeing the Janus memory on the servers grow...  although is to early to tell if its a leak as severe as the one I initially reported with the headless user, .. Just wondering if a headless user with no media would somehow "leak" when only receiving events..   I do see that when the node user joins the "setup=false" field you created as part of the original fix is there. But is there a chance something is not being cleaned up?  on most servers I am on v0.4.5 but on one of them I am still on v0.4.0, I see that particular server's memory growing the fastest...
Thanks.

Kaplan

unread,
Nov 12, 2018, 10:02:50 AM11/12/18
to meetecho-janus
Sorry, I just read what I posted and I am not sure it was clear. I do see the Janus process memory growing on the servers, but I am not sure at all its a leak, it might be just normal. so far its too early for me to tell, as linux has not killed the process like last time. I will keep an eye and report, sorry I got a little paranoid :)

Bellesoft Consulting

unread,
Nov 12, 2018, 5:24:22 PM11/12/18
to meetech...@googlegroups.com
at the end of the day, all the v0.5.4 Januses were fine. The only Janus server with a growing memory footprint turned out to be a v0.4.0 server (which according to the releases page its not a good one ;)   So move along folks, nothing to see here. sorry and thanks.

Reply all
Reply to author
Forward
0 new messages