when a session id is not destroyed the janus server cannot accept http request

2,190 views
Skip to first unread message

Wilbert Jackson

unread,
Nov 29, 2014, 9:10:02 AM11/29/14
to meetech...@googlegroups.com
Lorenzo

After testing the echotest and videocall programs running on three machines I have noticed that if the session is not destroyed by the Janus Server then any new create session id's are not processed. Viewing tcptracks or netstat will show that the server is receiving a lot of tcp syn_send packets and never receives the new session create request. A restart of the server is the only way I have found to sync the server to accept new session id request. 
In my test when the app ajax POST destroy message fails to be delivered it triggers the server to be out of sync with tcp request.

Server state when the POST destroy request fails
No WebRTC media anymore
[1680855497] ICE send thread leaving...
Detaching handle from JANUS EchoTest plugin
Handle detached (0), scheduling destruction
Destroying session 3378589623
[1680855497] Destroying SCTP association
[1680855497] WebRTC resources freed
[1680855497] Handle and related resources freed
[1680855497] Leaving SCTP association thread
No WebRTC media anymore
[234741942] ICE send thread leaving...
[234741942] Destroying SCTP association
[234741942] WebRTC resources freed
[234741942] Leaving SCTP association thread
Creating new session: 1162020257
Timeout expired for session 4076769303...
Cleaning up session 4076769303...
Destroying session 4076769303
Detaching handle from JANUS EchoTest plugin
Handle detached (0), scheduling destruction

Server state when the POST destroy request succeeds
[WARN] [286935095]    Unsupported transport tcp!
[286935095] The DTLS handshake has been completed
WebRTC media is now available
[286935095] Starting thread for SCTP association
[286935095] Started thread: setup of the SCTP association
[ERR] [sctp.c:janus_sctp_association_setup:264:] [286935095] Error connecting to SCTP server at port 5000
No WebRTC media anymore
[286935095] ICE send thread leaving...
Detaching handle from JANUS EchoTest plugin
Handle detached (0), scheduling destruction
Destroying session 3413917587
[286935095] Destroying SCTP association
[286935095] WebRTC resources freed
[286935095] Handle and related resources freed
[286935095] Leaving SCTP association thre


When running the test on my nexus tablet connected to a wireless network the POST destroy call fails sometimes and the server does not destroy the session id. Subsequent connection tries fail with is the server down message. In some cases the chrome browser becomes unresponsive.
$.ajax({
type: 'POST',
url: server + "/" + sessionId,
async: syncRequest, // Sometimes we need false here, or destroying in onbeforeunload won't work
cache: false,
contentType: "application/json",
data: JSON.stringify(request),
success: function(json) {
Janus.log("Destroyed session:");
Janus.log(json);
sessionId = null;
connected = false;
if(json["janus"] !== "success") {
Janus.log("Ooops: " + json["error"].code + " " + json["error"].reason); // FIXME
}
callbacks.success();
gatewayCallbacks.destroyed();
},
error: function(XMLHttpRequest, textStatus, errorThrown) {
Janus.log(textStatus + ": " + errorThrown); // FIXME
// Reset everything anyway
sessionId = null;
connected = false;
callbacks.success();
gatewayCallbacks.destroyed();
},
dataType: "json"
});
}

Lorenzo Miniero

unread,
Nov 29, 2014, 9:59:21 AM11/29/14
to meetech...@googlegroups.com
Hi,

not destroying sessions is entirely ok. It's absolutely normal to have multiple sessions going on at the same time, that's the whole purpose of the gateway. Besides, there's an internal watchdog that keeps track of sessions and gets rid of them when there's a timeout or the user who originated it has disappeared.

Not replying to SYN is not anything stuff in my code can cause, as the web server is made available by a different library with a threading mechanism of its own. Are you wrapping the Janus web server somehow, e.g., with a frontend like Apache HTTPD, NGINX or others? Or are your applications contacting the Janus directly on the port it's listening on (e.g., 8088 by default)? If it's the former, the issue may be somewhere there.

L.

Lorenzo Miniero

unread,
Nov 29, 2014, 10:02:57 AM11/29/14
to meetech...@googlegroups.com
If you believe it's an issue with sessions and want to check if there's a deadlock there, replace all the locks on the sessions_mutex with the debug variant, which will show you on the console everytime a lock and unlock is attempted. A series of lock with no unlock will tell you if there's been a deadlock somewhere.

janus.c: replace all
 
janus_mutex_lock(&sessions_mutex);

with
 
janus_mutex_lock_debug(&sessions_mutex);

and all
 
janus_mutex_unlock(&sessions_mutex);

with
 
janus_mutex_unlock_debug(&sessions_mutex);
 
 
L.

Wilbert Jackson

unread,
Nov 29, 2014, 10:54:30 AM11/29/14
to meetech...@googlegroups.com
Not sure there is an issue with sessions, but I am seeing the server not responding with any http request right after a client network http connection fails. In the screen shot this server state is repeated each time a destroy session request fails to be sent by the client. The syn_sent packets are all janus server tcp packets and occur right after the ajax destroy request. As well, the destroying session messages shown below that I see on a successful destroy are missing. 
I sent you my janus.cfg file. Have done something wrong in configuring the server? I can definitely observe over repeated test that the server hangs when destroy messages are not received.

Destroying session 3413917587
[286935095] Destroying SCTP association
[286935095] WebRTC resources freed
[286935095] Handle and related resources freed
[286935095] Leaving SCTP association thread



Lorenzo Miniero

unread,
Nov 29, 2014, 10:59:02 AM11/29/14
to meetech...@googlegroups.com
What is port 7090? Are you using HTTPS as in your previous configurations? That's what the secure_port is set to, if I read it right.
Have you tried if this happens with plain HTTP as well? Or are you sending HTTP requests to the HTTPS web server..?

L.

Wilbert Jackson

unread,
Nov 29, 2014, 12:27:31 PM11/29/14
to meetech...@googlegroups.com

Port 7090 is the port I have assigned to the Janus Web Server. My html server is on port 3004. I use https for both servers, and I have not tried the http janus server. I have used port 7090 for couple of days of testing and been originally using port 8080 and 8088. I changed ports to see if there where any port related conflicts. All ports used had the same problem. For the web server the config settings are:

http = no ; Whether to enable the plain HTTP interface
port = 8088 ; Web server HTTP port
https = yes ; Whether to enable HTTPS (default=no)
secure_port = 7090 ; Web server HTTPS port, if enabled
ws = no ; Whether to enable the WebSockets interface
ws_port = 8188 ; WebSockets server port
ws_ssl = no ; Whether to enable secure WebSockets
;ws_secure_port = 8989; ; WebSockets server secure port, if enabled

Lorenzo Miniero

unread,
Nov 29, 2014, 12:31:53 PM11/29/14
to meetech...@googlegroups.com
Please check if you're experiencing issues with plain HTTP as well, that is if the web server stops responding when you do your tests using the HTTP web server only (port 8088). IIRC correctly there were some issues with the integrated HTTPS support in libmicrohttpd, mostly with very high CPU usage after the first handled request was closed, and which may be related to or indeed the cause of the issue you're getting. Apparently this is not happening in all MHD deployments, and seems to depend on the version used by some distributions, as for users it works fine instead.

If it works fine with plain HTTP and you care for HTTPS, check the "Deploy" section of the Janus documentation to see how you can proxy requests through a frontend.

Lorenzo

Wilbert Jackson

unread,
Nov 29, 2014, 1:42:24 PM11/29/14
to meetech...@googlegroups.com
Thanks, will try that setup.

Wilbert Jackson

unread,
Nov 30, 2014, 8:20:22 AM11/30/14
to meetech...@googlegroups.com
Lorenzo,

The high cpu usage when using libmicrohttpd was the problem. Threads consuming 100% of the cpu. Changed the server to use http instead of https and retested connecting and disconnecting apps as before. I have not seen any failures in destroying the plugin as before. 

Thanks for your help.
 

On Saturday, November 29, 2014 12:31:53 PM UTC-5, Lorenzo Miniero wrote:

Lorenzo Miniero

unread,
Nov 30, 2014, 8:31:38 AM11/30/14
to meetech...@googlegroups.com
Good to know a solution was found eventually!

Lorenzo

Wilbert Jackson

unread,
Dec 2, 2014, 6:09:37 AM12/2/14
to meetech...@googlegroups.com
Lorenzo,

Since I am running the Janus Gateway with the http server how can I run a screen sharing service if it requires an https connection?

wilbert

Nicholas Wylie

unread,
Dec 2, 2014, 6:22:05 AM12/2/14
to meetech...@googlegroups.com
Hi Wilbert,


I've setup nginx as a reverse proxy for Janus. nginx serves the web content, and forwards all connections to "https://server.com/janus" through to the gateway running on the localhost.

It works quite well. nginx is able to handle all the SSL stuff, and I don't have to worry about CORS (which was causing issues in Firefox).

Wilbert Jackson

unread,
Dec 2, 2014, 7:13:16 AM12/2/14
to meetech...@googlegroups.com
Thanks,

In the design below is the Janus Web Server running over https on the local host? If so how does that get around using libmicrohttpd, which has the high CPU usage problem. My current setup is a node.js server serving web content and forwarding request to a Janus Web Server. I have changed the node.js server to serve over http rather than https to be able to forward the http request to the Janus Web Server.

I've setup nginx as a reverse proxy for Janus. nginx serves the web content, and forwards all connections to "https://server.com/janus" through to the gateway running on the localhost.

Lorenzo Miniero

unread,
Dec 2, 2014, 7:14:35 AM12/2/14
to meetech...@googlegroups.com
Wilbert,

the idea is that you talk HTTPS to nginx, apache or others, not Janus. The frontend then proxies the requests/responses to/from Janus using plain HTTP.

L.

Wilbert Jackson

unread,
Dec 2, 2014, 7:46:07 AM12/2/14
to meetech...@googlegroups.com
Ok so the "https://server.com/janus"  in the below statement would be "http://server.com/janus"?

I've setup nginx as a reverse proxy for Janus. nginx serves the web content, and forwards all connections to "https://server.com/janus" through to the gateway running on the localhost.

Lorenzo Miniero

unread,
Dec 2, 2014, 7:59:36 AM12/2/14
to meetech...@googlegroups.com
Yep.

Nicholas Wylie

unread,
Dec 2, 2014, 4:26:35 PM12/2/14
to meetech...@googlegroups.com
Sorry if I was unclear.

That is pretty much how I have things setup. The only difference is that I have Janus listening on the loopback interface "http://127.0.0.1:8088"

This just means that the gateway cannot be accessed from outside the server without going through nginx.

Joseph Ridgway

unread,
Jan 14, 2015, 3:00:48 AM1/14/15
to meetech...@googlegroups.com
Is it possible to proxy websockets? I'm trying to make this work with wss and nginx, with nginx ending the wss and relaying ws to janus on a separate sever. Should I stick with plain http/s?

I'm trying to make this work after realizing that browsers don't like the self-signed certificates that Janus comes with. I wish I could simply throw Janus behind my site's ssl, but I'm running Janus on dynamically created ec2 instances that need to be up and running in a moment.

Nicholas Wylie

unread,
Jan 14, 2015, 4:30:46 AM1/14/15
to meetech...@googlegroups.com
Yep, it's possible.

Joseph Ridgway

unread,
Jan 14, 2015, 8:49:49 AM1/14/15
to meetech...@googlegroups.com
Thanks, Nicholas. I have websockets proxied just as in the article you linked to. I think it's working properly, except audio/video feeds aren't coming through. Here's the end of my log in case you see something:

WebSocket onopen: #17
Joining WebSocket thread: #17
Creating new session: 285697201
Creating new handle in session 285697201: 1783852675
[1783852675] There's a message for JANUS VideoRoom plugin
[1783852675] There's a message for JANUS VideoRoom plugin
[1783852675] Creating ICE agent (controlled mode)
[1783852675] ICE send thread started...
[WARN] [1783852675] Queueing trickle candidate, status is not START yet
[WARN] [1783852675] Queueing trickle candidate, status is not START yet
[1783852675] Done! Ready to setup remote candidates and send connectivity checks...
[WARN] [1783852675]    Unsupported transport tcp!
[WARN] [1783852675]    Unsupported transport tcp!
No more remote candidates for handle 1783852675!

Nicholas Wylie

unread,
Jan 14, 2015, 4:44:33 PM1/14/15
to meetech...@googlegroups.com
Looks like the WebSocket connection is working.

Did you have the audio/video feeds working before proxying the websocket connections, and have you got the public IP configured in Janus?

Joseph Ridgway

unread,
Jan 14, 2015, 4:50:47 PM1/14/15
to Nicholas Wylie, meetech...@googlegroups.com
I did indeed. I had audio and video working, then went to add screen sharing with SSL and now I'm stuck with all the proxying. 

The Janus servers that I am proxying to are on separate machines. My understanding is that I would have to proxy all UDP traffic on a specific port/range in order to get this working. Am I correct? 

Right now I'm trying to get this working purely with DNS changes and bypassing the proxy method altogether. It seems simple enough but the SSL handshake isn't going through.

--
You received this message because you are subscribed to a topic in the Google Groups "meetecho-janus" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/meetecho-janus/C04W3qnFFwc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to meetecho-janu...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Joseph Ridgway, CEO
4Good

Nicholas Wylie

unread,
Jan 14, 2015, 5:11:24 PM1/14/15
to meetech...@googlegroups.com, now...@gmail.com
The Janus servers that I am proxying to are on separate machines. My understanding is that I would have to proxy all UDP traffic on a specific port/range in order to get this working. Am I correct? 

Not quite.

The WebSocket connection you had configured is only the signalling layer of the WebRTC stack. Have a look at this link for an overview of the different parts involved in the WebRTC stack.

Essentially, think of the server running Janus as one peer and the client connecting to Janus as another peer.
The two peers utilize the signalling layer (Janus API over WebSockets in this case) to establish a RTC session between themselves.
As part of the session establishment process a peer gathers candidate addresses and then forwards them over the signalling layer to the other peer. The other peer then uses these addresses to attempt to connect to the first peer and repeats the process in reverse (sending it's own candidate addresses back to the original peer). For a more information on all the moving parts, have a look into ICE (Interactive Connectivity Establishment), STUN (Session Traversal Utilities for NAT) and TURN (Traversal Using Relays around NAT).

So you don't need to proxy any UDP traffic, because the two peers (Janus and your client) should be exchanging data directly. If they are unable to exchange packets, it's because the session establishment was not successful. In this instance you might need to look at setting up a TURN server.
Generally speaking if Janus is running on a server with a public IP address and configured correctly, you won't have any problems. (Unless you have a very restrictive firewall in between your peers somewhere)

Joseph Ridgway

unread,
Jan 15, 2015, 7:40:49 PM1/15/15
to meetech...@googlegroups.com, now...@gmail.com
Thank you very much for your help on this and for the background information. I initially had the echo test working on plain http and ws. I ran into my current issue when I tried to move to wss. This morning I decided to go back to ws where I had things working, and found out that I had the same issue with ws and http as well. Moreover, I've tried many times to start over from scratch on new servers and I can't get it working they way it was a few days ago. I'm not sure what I am doing wrong, or if it even is me. I have all ports open on the server. If I have the public_ip setting set in janus.cfg, I get a bunch of "DTLSv1_get_timeout" messages in my log with the log level set to 6. If I comment out the public_ip setting, I don't see the "DTLSv1_get_timeout" messages, but video still does not come through in the echo test and I see "Got a video candidate but we're bundling, ignoring..." in the log. Not sure what that means. 

Lorenzo Miniero

unread,
Jan 16, 2015, 9:38:10 AM1/16/15
to meetech...@googlegroups.com, now...@gmail.com
Just a quick response (still abroad, so can't look into anything ATM), the DTLSv1_get_timeout are not something to worry about. When enabling a huge debug, the DTLS retransmission monitor just prints a lot of verbose info, including how much time it will take for the DTLS timeout to fire. So if you see a "DTLSv1_get_timeout: 3000", for instance, it just means that if the DTLS stack doesn't get a response in 3 seconds it will retransmit the following packet. So even in a normal situation where everything works fine you'll see it in a "huge" log.

Not seeing them at all with a huge log is actually a bad thing, because it means ICE did not complete, and so the DTLS handshake never started at all. Ignore the "bundling" messages info, that's just verbosity and nothing you need to care about.

L.

Joseph Ridgway

unread,
Jan 16, 2015, 2:22:36 PM1/16/15
to meetech...@googlegroups.com, now...@gmail.com
Just for reference (again), it turned out to be the latest version of OpenSSL that was my issue as mentioned in https://github.com/meetecho/janus-gateway/issues/132.

Lorenzo Miniero

unread,
Jan 20, 2015, 4:39:54 AM1/20/15
to meetech...@googlegroups.com, now...@gmail.com
Just FYI, William actually found a solution to the issue for all openssl versions. I've just committed it on github.

L.
Reply all
Reply to author
Forward
0 new messages