ICE gathering stuck/failed with strange ice candidates

233 views
Skip to first unread message

Никита Егорихин

unread,
Apr 29, 2020, 10:22:05 AM4/29/20
to discuss-webrtc
 Hi,

Our team faced a very unstable and hardly reproducable issue with webrtc calls. It reproduces mostly on macOS connected through VPN but there were also reports from Windows users and users without VPN (but we couldn't dump their webrtc internals as they were very rare).

I've attached bad and good internals dumps and there is a strange thing which I couldn't understand.
In good case the ICE candidate gathering looks like this
    {
     
"time": "4/17/2020, 12:51:50 PM",
     
"type": "iceGatheringStateChange",
     
"value": "gathering"
   
},
   
{
     
"time": "4/17/2020, 12:51:50 PM",
     
"type": "onIceCandidate",
     
"value": "sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:2131708102 1 udp 2122260223 192.168.0.100 54289 typ host generation 0 ufrag RZzw network-id 1 network-cost 10"
   
},
   
{
     
"time": "4/17/2020, 12:51:50 PM",
     
"type": "onIceCandidate",
     
"value": "sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:291575595 1 udp 2122194687 10.9.18.250 62954 typ host generation 0 ufrag RZzw network-id 2 network-cost 50"
   
},
   
{
     
"time": "4/17/2020, 12:51:50 PM",
     
"type": "onIceCandidate",
     
"value": "sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:831304758 1 tcp 1518280447 192.168.0.100 9 typ host tcptype active generation 0 ufrag RZzw network-id 1 network-cost 10"
   
},
   
{
     
"time": "4/17/2020, 12:51:50 PM",
     
"type": "onIceCandidate",
     
"value": "sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:1608707035 1 tcp 1518214911 10.9.18.250 9 typ host tcptype active generation 0 ufrag RZzw network-id 2 network-cost 50"
   
},
   
{
     
"time": "4/17/2020, 12:51:50 PM",
     
"type": "iceGatheringStateChange",
     
"value": "complete"
   
},
as you can see four candidates are gathered: two for 10.9.18.250 (VPN) and two for 192.168.0.100 (local network)

In the bad case though the gathering is:
    {
     
"time": "4/17/2020, 12:12:03 PM",
     
"type": "iceGatheringStateChange",
     
"value": "gathering"
   
},
   
{
     
"time": "4/17/2020, 12:12:03 PM",
     
"type": "onIceCandidate",
     
"value": "sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:2131708102 1 udp 2122260223 192.168.0.100 55615 typ host generation 0 ufrag A2X5 network-id 1 network-cost 10"
   
},
   
{
     
"time": "4/17/2020, 12:12:03 PM",
     
"type": "onIceCandidate",
     
"value": "sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:831304758 1 tcp 1518280447 192.168.0.100 9 typ host tcptype active generation 0 ufrag A2X5 network-id 1 network-cost 10"
   
},
   
{
     
"time": "4/17/2020, 12:12:03 PM",
     
"type": "onIceCandidate",
     
"value": "sdpMid: 0, sdpMLineIndex: 0, candidate: candidate:1259056789 1 tcp 1518214911 10.9.18.1 9 typ host tcptype active generation 0 ufrag A2X5 network-id 2 network-cost 50"
   
},
   
{
     
"time": "4/17/2020, 12:12:14 PM",
     
"type": "stop",
     
"value": ""
   
},
   
{
     
"time": "4/17/2020, 12:12:14 PM",
     
"type": "iceConnectionStateChange",
     
"value": "closed"
   
},
   
{
     
"time": "4/17/2020, 12:12:14 PM",
     
"type": "iceGatheringStateChange",
     
"value": "complete"
   
}
you can see there are only 3 candidates: 2 for previosly seen 192.168.0.100 and only one TCP candidate 10.9.18.1 which seems like some kind of default gateway and not the computer's IP address.

I've also used the tool https://github.com/fippo/webrtc-dump-importer for importing those dumps:

Good:
good.png






Bad:
bad.png





Also note that sometimes (not in the case I've described) in such situations our client (based on JSSIP library) sends 408 RTP Timeout after several seconds. This actually means that RTCConnection's iceConnectionState is "failed".

We don't know what could we do next to troubleshoot those issues and what is going wrong in such cases.

Could you please assist with it?
webrtc_internals_dump_good.txt
webrtc_internals_dump_bad.txt
ipconfig.txt

Taylor Brandstetter

unread,
Apr 29, 2020, 3:28:29 PM4/29/20
to discuss-webrtc
Could you take a look at the Chrome debug log? I suspect you'll see "UDP bind failed with error".

I've seen cases before where an error occurs when you try to bind to the VPN address directly, but not when binding to the wildcard address. See this bug: https://bugs.chromium.org/p/webrtc/issues/detail?id=9313

I've only seen this on Android so far but I wouldn't be surprised if it happened on other OSs.

--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/discuss-webrtc/aa444b9b-f84e-46bf-b6ff-9b15769ea1b5%40googlegroups.com.
Message has been deleted
Message has been deleted

Никита Егорихин

unread,
May 2, 2020, 11:30:49 AM5/2/20
to discuss-webrtc
Thanks for the reply!
We will totally look at debug logs but there is a problem with that approach: to enable logs we should restart the app (it's based on electron). And the thing is restarting fixes this issue and after it ports are bound correctly. As far as I understand, the issue could be in some kind of internal chrome caching which saves IPs for ICE candidates somewhere and restarting just starts those caches over and the issue is resolved. Do you know any way to maybe reset the state of WebRTC from the client js side?

Also I'm not quite understand what's the state of the bug you've mentioned: it was fixed two years ago, then reverted, moved to Resolved and yesterday it was moved to Opened once again. So am I getting it correctly it's not fixed in any version of chrome?

Again, thanks for your help with that.

среда, 29 апреля 2020 г., 17:22:05 UTC+3 пользователь Никита Егорихин написал:

Taylor Brandstetter

unread,
May 4, 2020, 3:25:42 PM5/4/20
to discuss-webrtc
Correct, the linked bug is not fixed in any stable version of Chrome.

But it sounds like you ran into something different. It is possible that "the electron Chrome part" somehow missed a network interface change and was giving WebRTC an old, obsolete IP address to attempt to bind to, as you suggest. For example, here's what's responsible for that on Windows.

However, the only known issue with that code has to do with IPv6 addresses, so I don't know if that's really the issue. Unfortunately I don't have a suggestion other than to wait until it reproduces again with logs.

On Sat, May 2, 2020 at 8:30 AM Никита Егорихин <egor...@gmail.com> wrote:
Thank you for your help. We will definitely try to take a look on those logs.

The problem with logs though is that our application is based on electron and should be restarted in order to pass commandline arguments. But for some reason restart fixes the issue and ports are bound correctly. As far as i understand it, something in electron's chrome part has cached something like IP addresses for ICE candidates or something like that and restart just resets all those caches and it is working after that. I doubt it but maybe there is some way to emulate this behavior by the client code?

I'm not quite understand what is the resolution of the bug you've linked: 2 years ago it was merged, then reverted, then moved to "Fixed" and yesterday it was moved back to open because of a revert, so I'm getting it correctly that it's not fixed in any of chrome versions right now?



среда, 29 апреля 2020 г., 22:28:29 UTC+3 пользователь Taylor Brandstetter написал:
To unsubscribe from this group and stop receiving emails from it, send an email to discuss...@googlegroups.com.

--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages