getting a lot long poll timeouts

985 views
Skip to first unread message

Wilbert Jackson

unread,
Jan 11, 2015, 3:03:04 PM1/11/15
to meetech...@googlegroups.com
Lorenzo,

I updated to the latest gateway version. I am running the janus sample programs (echotest, videocall, videomcu) as confidence tests and am getting a lot of  long poll connection network failures. The failures cause the app to stop and the server becomes unstable after this occurs as it cannot find the session id to destroy. It seems the failures occur when the bandwidth is around 500 kbits/sec. Did not have this problem with the previous gateway version. 

If I just concentrate on the echotest and run it on your site it works fine and I run it for a couple of hours. However the running the same code from an apache and nodejs web servers stops after about 15 minutes and sometimes a shorter time run. The runs always stop due to a long poll timeout. I tried setting the longpoll ajax timeout to different values with no luck. Any suggestions?

When the problems occurs the gateway log shows:

New connection on REST API: 108.20.20.72
Got a HTTP GET request on /janus/3886497332...
 ... Just parsing headers for now...
Got a HTTP GET request on /janus/3886497332...
 ... parsing request...
Session: 3886497332
Session 3886497332 found... returning up to 1 messages
... handling long poll...
Long poll time out for session 3886497332...
We have a message to serve...
{"janus" : "keepalive"}
Request completed, freeing data
New connection on REST API: 108.20.20.72
New connection on REST API: 108.20.20.72
Got a HTTP GET request on /janus/3886497332...
 ... Just parsing headers for now...
Got a HTTP GET request on /janus/3886497332...
 ... parsing request...
Session: 3886497332
Session 3886497332 found... returning up to 1 messages
... handling long poll...
Long poll time out for session 3886497332...
We have a message to serve...
{"janus" : "keepalive"}
Request completed, freeing data
New connection on REST API: 108.20.20.72
Got a HTTP GET request on /janus/3886497332...
 ... Just parsing headers for now...
Got a HTTP GET request on /janus/3886497332...
 ... parsing request...
Session: 3886497332
Session 3886497332 found... returning up to 1 messages
... handling long poll...
Long poll time out for session 3886497332...
We have a message to serve...
{"janus" : "keepalive"}
Request completed, freeing data
Timeout expired for session 3886497332...
Cleaning up session 3886497332...
Destroying session 3886497332
Detaching handle from JANUS VideoCall plugin
Removing user wj session...
No WebRTC media anymore
  -- Removed: 1
[1719927149] Adding event to queue of messages...
Handle detached (0), scheduling destruction
Checking 1 old sessions
Checking 1 old sessions
Checking 1 old sessions
Checking 1 old sessions


thanks

wilbert jackson

Lorenzo Miniero

unread,
Jan 12, 2015, 7:25:14 AM1/12/15
to meetech...@googlegroups.com
Long poll timeout is a client thing, that maybe our library doesn't handle properly. It just means that the long poll connection (the one for events) didn't get an answer in 60 (or 30? can't remember) seconds. On the server side there only is a check for activity: if no activity is detected in 60 seconds, the related session is closed. That's what the keep-alive messages are for.

If you're getting this when bandwidth usage increases, it means the available bandwidth may not be enough: UDP for media eats it all, and the TCP connections eventually timeout because of that. Either that, or it's something in the frontends you use that cause the requests to timeout. To handle that properly, try modifying the behaviour of the demo pages not to fail when the long poll timeouts, but to issue the request again. Also try issuing Janus keep-alive or "ping" requests to see what's the HTTP round-trip time for those: if they take a lot to complete (or they timeout as well) then the bandwidth is the issue.

L. 

Wilbert Jackson

unread,
Jan 12, 2015, 9:40:35 AM1/12/15
to meetech...@googlegroups.com
Thanks,
Will try the suggestions.

On further testing, using the echotest, we find that the server timeout occurs if the longpoll ajax call does not get a time latency return of less than 35 seconds. On one of the test computers the return is always in the 30 second range. When running the echotest from your web site the return is in the 30 second range and the app never times out. On a couple of our other computers, tablet, desktop and laptop, the longpoll return is the 40-60 second return time latency range and ajax get call time out the status goes from pending to cancelled. We tried capping the bandwidth at 128 kbits/sec in the echotest, but have the same problem.

I don't believe its a bandwidth problem related to our network or computers. Correct me if I am wrong, but if we run the echotest from your site, and the longpoll call time latency stays at around 30 seconds on the problem computers, then our network and computers are providing the proper bandwidth. Would this be the correct analysis? W

We are using the Google CDN to load and manage the jQuery ajax calls to reduce time latency, but does not help. 

Lorenzo Miniero

unread,
Jan 12, 2015, 9:49:44 AM1/12/15
to meetech...@googlegroups.com
I mentioned bandwidth because it's one of the most common causes of long poll timeout issues. The timeout value for the long poll event handler in janus.js is 60 seconds:


The server instead answers to an event request in up to 30 seconds (with an empty response if no event is available, or with some events if they are), so a 60s timeout on the client side should be plenty of time to cover slow networks. Besides, the code also tries to automatically retry up to three times if it gets a failure, which is what I suggested as a manual solution. To get a timeout in janus.js, you should get an additional 30s of time to physically send a request to the server and to get a response after that, and it should happen more than once. Such long latency times are not normal, and would be unfit for real-time media in general.

Or are you getting the error in other parts of the code?

L.

Wilbert Jackson

unread,
Jan 12, 2015, 10:18:26 AM1/12/15
to meetech...@googlegroups.com
Only get the error when a GET longpoll ajax request is made on 3 out of the 4 computers we use in the test. All are connected to the same LAN which a 100 mbits/sec connected via a Internet Router running at 65 mbits/sec. The Ubuntu computer works ok, but the windows desktop and Nexus Android computers do not. This use to work and we tested the videomcu app for about a month without problem. The only change we made recently was to upgrade our Internet connection speed from 25 mbits/sec to 75 mbits/sec and pulled the new version of the gateway.

thanks

Lorenzo Miniero

unread,
Jan 12, 2015, 10:28:24 AM1/12/15
to meetech...@googlegroups.com
I guess you'll have to check with Wireshark or other tools what's causing specific requests to fail or timeout.

L.

Wilbert Jackson

unread,
Jan 14, 2015, 4:52:39 PM1/14/15
to meetech...@googlegroups.com
Lorenzo,

Thanks for the great piece of software and technical architecture. Had a chance to dig into the code in trying to solve the HTTP timeout problem. Turned out to be a bad router that would go into a mode of retransmitting packets. We have run our code, which is based on the videomcu plugin and adds user presence, and call flow states, on the gateway serving 25 clients for 4 days 24/7 without problem.

Thanks

Wilbert Jackso
Reply all
Reply to author
Forward
0 new messages