Delay in ICE candidate collection

3,848 views
Skip to first unread message

Basar Daldal

unread,
Jan 20, 2015, 7:56:28 AM1/20/15
to discuss...@googlegroups.com
While trying to start or answer or a call, it takes 10+ seconds to complete the ice candidate collection. This happens when the PC has multiple network interfaces (VPN, WiFi etc) and some of those interfaces cannot reach the STUN server. As far as I see, 10 seconds here is the timeout duration after which Chrome decides to try the next interface. So, if you have 2 interfaces that cannot reach STUN server, it takes 20 seconds and if you have 3 it takes 30 seconds etc. 

I know that a solution here is to use Trickle ICE. However, I wonder if there a plan to optimize this procedure; isn't it possible to start the procedure for all network interfaces at the same time, like threads running simultaneously? Or is there a way to change the 10 seconds to a lower value?

Thanks,
Basar

Simon Perreault

unread,
Jan 20, 2015, 9:04:58 AM1/20/15
to discuss...@googlegroups.com

On Tue, Jan 20, 2015 at 7:56 AM, Basar Daldal <basar....@gmail.com> wrote:
While trying to start or answer or a call, it takes 10+ seconds to complete the ice candidate collection. This happens when the PC has multiple network interfaces (VPN, WiFi etc) and some of those interfaces cannot reach the STUN server. As far as I see, 10 seconds here is the timeout duration after which Chrome decides to try the next interface. So, if you have 2 interfaces that cannot reach STUN server, it takes 20 seconds and if you have 3 it takes 30 seconds etc. 

I know that a solution here is to use Trickle ICE. However, I wonder if there a plan to optimize this procedure; isn't it possible to start the procedure for all network interfaces at the same time, like threads running simultaneously? Or is there a way to change the 10 seconds to a lower value?

Interesting question. Theoretically I don't see any disadvantage with trying them all in parallel. I'd like to hear from the Chrome devs about this.

Simon

Emil Ivov

unread,
Jan 20, 2015, 9:45:42 AM1/20/15
to discuss...@googlegroups.com
From what we have seen, candidate gathering does happen in parallel on
all interfaces.

Big delays are still possible but only if the application waits for
all gathering to complete rather than sending candidates as they
become available (which is what Trickle ICE is about).

Emil

--
https://jitsi.org

Philipp Hancke

unread,
Jan 20, 2015, 10:30:14 AM1/20/15
to discuss...@googlegroups.com
From what I recall the problem is largely determining when the request to the STUN server times out. Check for the error messages from https://code.google.com/p/chromium/codesearch#chromium/src/third_party/webrtc/p2p/base/stunport.cc&l=90

--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Simon Perreault

unread,
Jan 20, 2015, 10:47:52 AM1/20/15
to discuss...@googlegroups.com

On Tue, Jan 20, 2015 at 10:30 AM, Philipp Hancke <philipp...@googlemail.com> wrote:
From what I recall the problem is largely determining when the request to the STUN server times out.

If you fire a STUN request for each interface in parallel, you don't really care about timeout, do you? You just generate a server-reflexive candidate whenever a STUN response comes back. Or is there another reason why proceeding serially is desirable?

Same for TURN: you could send one Allocate request per TURN server out of each interface in parallel. Generate server-reflexive and relayed candidates whenever a response comes back.

Now, I haven't tested this scenario and maybe that's what Chrome has been doing all along... ;)

Simon

Emil Ivov

unread,
Jan 20, 2015, 11:16:41 AM1/20/15
to discuss...@googlegroups.com
No it hasn't. I have never seen Chrome do this. I don't believe it has
ever been the case and I don't see anything in the code that makes me
thing it can do so today.

Emil

--
https://jitsi.org

Justin Uberti

unread,
Jan 20, 2015, 8:52:13 PM1/20/15
to discuss-webrtc
We do all the requests in parallel. We wait somewhere around 9 seconds to give up, but you could certainly give up earlier if you want to (i.e. in your application).

Basar Daldal

unread,
Jan 21, 2015, 6:24:02 AM1/21/15
to discuss...@googlegroups.com
Thanks a lot for the comments. I don't remember exactly when I saw the 20-30 seconds of ice candidate collection duration, but I did a new test today and I see that the requests are done in parallel. The behavior may have changed some time ago but I was not aware of that. 

So, as Justin says, I think it is possible for an application to decide to not wait until the end of ICE candidate collection process and just send the local sdp via the signalling channel to the remote end x seconds after the process starts. 

Basar Daldal

unread,
Jan 21, 2015, 11:11:55 AM1/21/15
to discuss...@googlegroups.com
I have noticed another thing today - even if I only use a TURN server with transport=tcp and scheme as turns, I see that Chrome is sending UDP STUN packets to the TURN server in parallel to the TCP requests. Since it does not receive any response to the UDP STUN requests, the ICE collection is completed after 10 seconds, so this causes a delay in call setup. Maybe this can be another topic to discuss, but I wonder why Chrome is sending UDP messages to a TURN server which is configured as TCP?

Emil Ivov

unread,
Jan 21, 2015, 11:16:37 AM1/21/15
to discuss...@googlegroups.com

Quick comment here:

The fact that your candidate harvest has not completed does not necessarily imply a call delay.

You simply need to make sure the you exchange SDP before even starting ICE. You then send your candidates as Chrome makes them available to you.

--sent from my mobile

Alexandre GOUAILLARD

unread,
Jan 21, 2015, 8:28:23 PM1/21/15
to discuss...@googlegroups.com
emil.

I think basar mentioned in his first e-mail he was aware of trickle ice but did not want to do it.
Alex. Gouaillard, PhD, PhD, MBA
------------------------------------------------------------------------------------
CTO - Temasys Communications, S'pore / Mountain View
President - CoSMo Software, Cambridge, MA
------------------------------------------------------------------------------------

Justin Uberti

unread,
Jan 22, 2015, 12:39:01 AM1/22/15
to discuss-webrtc
yeah, we shouldn't be doing this. This seems like a bug. Can you file an issue at bugs.webrtc.org?

Basar Daldal

unread,
Jan 22, 2015, 3:48:20 AM1/22/15
to discuss...@googlegroups.com
I have filed an issue for the TURN TCP problem (chrome sends udp stun requests in parallel with tcp connection even if the turn is configured as tcp)


Yes, I was looking for a solution other than Trickle ICE.

Emil Ivov

unread,
Jan 22, 2015, 5:39:51 AM1/22/15
to discuss...@googlegroups.com
On Thu, Jan 22, 2015 at 9:48 AM, Basar Daldal <basar....@gmail.com> wrote:
> Yes, I was looking for a solution other than Trickle ICE.

Oh, OK. I guess I missed that part.

Out of curiosity, could you please share the reason for avoiding
trickle ICE? Is it legacy interop or is there anything else?

Emil
https://jitsi.org

Basar Daldal

unread,
Jan 23, 2015, 4:47:32 AM1/23/15
to discuss...@googlegroups.com
For now, I was just looking for a quick solution, assuming that implementing Trickle ICE will not be so easy.

Emil Ivov

unread,
Jan 23, 2015, 7:38:26 AM1/23/15
to discuss...@googlegroups.com

 If you are using browsers on both sides then you basically get trickle for free. You just have to make sure that you send the SDP offer and answer and the ICE candidates as soon as each of those become available to you.

Most of the libraries implementing ICE can also handle trickle out of the box.

So no, there shouldn't be a significant level of increased complexity.

--sent from my mobile

Jeremy Noring

unread,
Jan 26, 2015, 1:43:47 PM1/26/15
to discuss...@googlegroups.com
I agree it's not super hard, but our experience updating our media server to do trickle ICE (using libnice) has been somewhat painful.  I think the number of corner-cases with trickle ICE are significant, because there's race conditions between the signaling plan and WebRTC doing ICE negotiation (for example, it generates a candidate and starts the ICE process locally before that candidate has been received by the remote party).  There's also some authentication dragons.

Justin Uberti

unread,
Jan 26, 2015, 11:33:53 PM1/26/15
to discuss-webrtc
You can have signaling/media races even with non-trickle ICE. For example, the candidates will be sent by the answerer at the same time the ICE process starts, and the initial ICE pings can beat the answer back to the caller.

Don't quite understand how authentication relates to trickle ICE either.

Jeremy Noring

unread,
Jan 27, 2015, 7:03:31 PM1/27/15
to discuss...@googlegroups.com
On Monday, January 26, 2015 at 9:33:53 PM UTC-7, Justin Uberti wrote:
You can have signaling/media races even with non-trickle ICE. For example, the candidates will be sent by the answerer at the same time the ICE process starts, and the initial ICE pings can beat the answer back to the caller.

That's true, I had not considered that.  I assume the only way to resolve any of these is for either side to be able to understand that an ICE ping prior to a candidate likely means a candidate is on the way?  Or is there something I'm missing here?
 

Don't quite understand how authentication relates to trickle ICE either.

The issue we hit was with ice-ufrag/ice-pwd; our media server wasn't setting those on the stream level, only with each candidate.  If the ICE pings got to libnice first, it'd create a candidate but without remote credentials, so it would not bother investigating that candidate any further.  For whatever reason, when it received the actual candidate, it would not update it correctly (I didn't get t.  Possible this is a bug in libnice (admitted the server's using a version that's about two years out of date), although the fix was pretty easy: nice-agent-set-remote-credentials

Emil Ivov

unread,
Jan 27, 2015, 7:16:59 PM1/27/15
to discuss...@googlegroups.com
On Tue, Jan 27, 2015 at 6:03 PM, Jeremy Noring <jno...@hirevue.com> wrote:
> On Monday, January 26, 2015 at 9:33:53 PM UTC-7, Justin Uberti wrote:
>>
>> You can have signaling/media races even with non-trickle ICE. For example,
>> the candidates will be sent by the answerer at the same time the ICE process
>> starts, and the initial ICE pings can beat the answer back to the caller.
>
>
> That's true, I had not considered that. I assume the only way to resolve
> any of these is for either side to be able to understand that an ICE ping
> prior to a candidate likely means a candidate is on the way? Or is there
> something I'm missing here?

Why do you think they need to be resolved?

We are only talking about potentially missing a STUN retransmission here.

>> Don't quite understand how authentication relates to trickle ICE either.
>
> The issue we hit was with ice-ufrag/ice-pwd; our media server wasn't setting
> those on the stream level, only with each candidate.

This was only true for very early versions of ICE. Today standard ICE
only allows for session or media-level ufrag and pwd.

> If the ICE pings got
> to libnice first, it'd create a candidate but without remote credentials, so
> it would not bother investigating that candidate any further. For whatever
> reason, when it received the actual candidate, it would not update it
> correctly (I didn't get t. Possible this is a bug in libnice (admitted the
> server's using a version that's about two years out of date), although the
> fix was pretty easy: nice-agent-set-remote-credentials

I believe libnice has as a standard mode of operation and the
per-candidate auth makes me think you guys are using the GTalk one.
Maybe you may want to change and try the vanilla one ... although I am
not actually sure this one has trickle ...

I do admit that I expected libnice to behave better with trickle and
that this could have been premature for that specific case.

Emil

--
https://jitsi.org

Justin Uberti

unread,
Jan 27, 2015, 11:13:58 PM1/27/15
to discuss-webrtc
On Tue, Jan 27, 2015 at 4:16 PM, Emil Ivov <em...@jitsi.org> wrote:
On Tue, Jan 27, 2015 at 6:03 PM, Jeremy Noring <jno...@hirevue.com> wrote:
> On Monday, January 26, 2015 at 9:33:53 PM UTC-7, Justin Uberti wrote:
>>
>> You can have signaling/media races even with non-trickle ICE. For example,
>> the candidates will be sent by the answerer at the same time the ICE process
>> starts, and the initial ICE pings can beat the answer back to the caller.
>
>
> That's true, I had not considered that.  I assume the only way to resolve
> any of these is for either side to be able to understand that an ICE ping
> prior to a candidate likely means a candidate is on the way?  Or is there
> something I'm missing here?

Yeah - you have to create a prflx candidate, and then you can mutate that into something else if the same candidate arrives in signaling.

Why do you think they need to be resolved?

We are only talking about potentially missing a STUN retransmission here.

No, it's worse than that. If you miss the STUN packet you need to wait until all the checks of the other pairs have been cycled through to retransmit, so this can lead to a noticeable delay. We noticed this during a recent round of optimization on the call setup processing in Chrome.
 

>> Don't quite understand how authentication relates to trickle ICE either.
>
> The issue we hit was with ice-ufrag/ice-pwd; our media server wasn't setting
> those on the stream level, only with each candidate.

This was only true for very early versions of ICE. Today standard ICE
only allows for session or media-level ufrag and pwd.

> If the ICE pings got
> to libnice first, it'd create a candidate but without remote credentials, so
> it would not bother investigating that candidate any further.  For whatever
> reason, when it received the actual candidate, it would not update it
> correctly (I didn't get t.  Possible this is a bug in libnice (admitted the
> server's using a version that's about two years out of date), although the
> fix was pretty easy: nice-agent-set-remote-credentials

I believe libnice has as a standard mode of operation and the
per-candidate auth makes me think you guys are using the GTalk one.
Maybe you may want to change and try the vanilla one ... although I am
not actually sure this one has trickle ...

I do admit that I expected libnice to behave better with trickle and
that this could have been premature for that specific case.

Emil

--
https://jitsi.org

Simon Perreault

unread,
Jan 28, 2015, 8:36:40 AM1/28/15
to discuss...@googlegroups.com

On Tue, Jan 27, 2015 at 11:13 PM, 'Justin Uberti' via discuss-webrtc <discuss...@googlegroups.com> wrote:
>> You can have signaling/media races even with non-trickle ICE. For example,
>> the candidates will be sent by the answerer at the same time the ICE process
>> starts, and the initial ICE pings can beat the answer back to the caller.
>
>
> That's true, I had not considered that.  I assume the only way to resolve
> any of these is for either side to be able to understand that an ICE ping
> prior to a candidate likely means a candidate is on the way?  Or is there
> something I'm missing here?

Yeah - you have to create a prflx candidate, and then you can mutate that into something else if the same candidate arrives in signaling.

Why do you think they need to be resolved?

We are only talking about potentially missing a STUN retransmission here.

No, it's worse than that. If you miss the STUN packet you need to wait until all the checks of the other pairs have been cycled through to retransmit, so this can lead to a noticeable delay. We noticed this during a recent round of optimization on the call setup processing in Chrome.

It's in the RFC for a good reason...

   Once an agent has sent its offer or its answer, that agent MUST be
   prepared to receive both STUN and media packets on each candidate.

[...]

   An agent MUST be prepared to receive a Binding request on the base of
   each candidate it included in its most recent offer or answer.  This
   requirement holds even if the peer is a lite implementation.

Simon

Saravanan Bellan

unread,
Feb 7, 2015, 4:05:34 PM2/7/15
to discuss...@googlegroups.com
Just to confirm my assumption about using Trickle ICE vs not using Trickle ICE is only in the application where you decide to send the offer right away and send candidates as they come vs wait for iceGatheringState == 'complete' and then send the offer.

What we are seeing on some machines is that the first call to the onIceCandidate callback is only after long delay (most of times 20 seconds and some times more than 30 seconds and sometimes less than 10 seconds also) as can be confirmed by the chrome://webrtc-internals(see attached). The Windows 7 machine which has only wired network interface enabled. I'm attaching the screen shot. The machine is running the latest version of chrome 40.0.2214.111. Any ideas on what could be happening?
webrtc_internals.PNG

KP Singh

unread,
Dec 3, 2015, 8:51:59 AM12/3/15
to discuss-webrtc
When my webAPP is running  in CEF3 latest version and trying to collect ICE condidate, I am facing same error, mentioned below, but another hand WebAPP is working fine with Chrome browser. 

Error: Ice candidate collection interrupted after given timeout, invoking successCallback

1449137228109 - WebRtcAdaptorImpl - Signalling state changed: state= have-local-offer
1449137228110 - WebRtcAdaptorImpl - Setting ice candidate collection timeout: 3000
1449137228110 - WebRtcAdaptorImpl - ICE candidate received: sdpMLineIndex = 0, candidate = candidate:3839742356 1 udp 2122260223 
1449137228110 - WebRtcAdaptorImpl - Ice candidate collection timer exists.
1449137228110 - WebRtcAdaptorImpl - ICE candidate received: sdpMLineIndex = 0, candidate = candidate:3839742356 2 udp 2122260222 
1449137228210 - WebRtcAdaptorImpl - Ice candidate collection timer exists.
1449137228210 - WebRtcAdaptorImpl - ICE candidate received: sdpMLineIndex = 0, candidate = candidate:2858298724 1 tcp 1518280447 
1449137228210 - WebRtcAdaptorImpl - Ice candidate collection timer exists.
1449137228210 - WebRtcAdaptorImpl - ICE candidate received: sdpMLineIndex = 0, candidate = candidate:2858298724 2 tcp 1518280446 
1449137231111 - WebRtcAdaptorImpl - Ice candidate collection interrupted after given timeout, invoking successCallback.
1449137231111 - WebRtcAdaptorImpl - previous mute state of call: false
1449137231111 - WebRtcAdaptorImpl - getLocalAudioTrack
1449137231112 - WebRtcAdaptorImpl - mute Audio Track [2afc3806-11c9-473f-bc5b-781bc17942f5], call [undefined] mute=false
1449137231112 - WebRtcAdaptorImpl - getLocalVideoTrack
1449137231113 - sdpParser - getSdpDirection: type= video state= inactive
1449137231113 - sdpParser - isSdpEnabled for type video: false
1449137231114 - callManager - [callManager.start : sdp ]v=0

I have posted same question to many webrtc discussion portals but didn't get any response. 
Please suggest about the problem and solution.

Reply all
Reply to author
Forward
0 new messages