Chrome relay candidate priority bug?

410 views
Skip to first unread message

Jeremy Noring

unread,
Nov 13, 2015, 6:46:57 PM11/13/15
to discuss-webrtc
I just filed https://code.google.com/p/chromium/issues/detail?id=555790, and wanted to get some feedback on it.  I think this is the bug that's been giving me grief for a few weeks now (seems much more pronounced with Chrome 47).  Basically, if Chrome is supplied with multiple TURN servers, it will produce candidates with identical priority.

Questions:
  1. Given multiple TURN servers, how should Chrome be generating the priority of respective candidates?  According to RFC 5245 it's a party foul to generate candidates with duplicate priorities, although that same RFC explicitly states it doesn't have anything to say about the multiple TURN server situation.
  2. Is there a workaround to this that anyone can see?
  3. Bigger picture question: for people like us using an SFU who really, honestly do not care about ICE negotiation (we always relay through a central server, for numerous reasons I won't get into here, but it's essentially the only reasonable way to go about it), is there an easier way to do this?  I understand ICE is critical for p2p, but for us, it's basically nothing but a headache.  
Thanks in advance.

Justin Uberti

unread,
Nov 13, 2015, 7:52:34 PM11/13/15
to discuss-webrtc
Interesting question. Can you file this in the webrtc tracker instead (as it's not chrome-specific)?

Why do you need relay candidates if you have a central server? Agree in your situation you can take a much simpler approach, e.g. ICE Lite.

--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/discuss-webrtc/1db3fe04-0cfe-497c-b097-fb76da627722%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jeremy Noring

unread,
Nov 14, 2015, 1:32:55 PM11/14/15
to discuss-webrtc

The reason I filed it on the Chrome tracker is this does appear fixed in the webrtc codebase; see third_party/libjingle/source/talk/app/webrtc/portallocatorfactory.cc.  However, Chrome doesn't use that; it uses content/renderer/p2p/port_allocator.cc; port_allocator.cc, which is missing a fix to tweak priority (see git commit 88853c77c292bbaeb93f79bfe1dee6f95f70b384; it gives each TURN server a different priority )

Regarding why we need relay candidates if we have a central server, the biggest reason for us is TURN servers resolve via DNS.  We have several customers who have networks that do DNAT; that is, everything internally resolves to an address that is different from the actual public address.  Example: https//use1-myserver.aws.com resolves to 50.40.30.20 on the open internet, but inside customer network X, it resolves to 10.9.8.7, which routes that request through some hideous NAT infrastructure that makes the translation 10.9.8.7 -> 50.40.30.20.

So for normal WebRTC negotiation, our server is going to hand back IP addresses that say "hey, you can reach me at 50.40.30.20", but inside that network, an attempt to establish a connection to that address will be summarily blocked.  But because TURN resolves by domain name, we can pass in https://use1-myserver.aws.com:someport, and it'll end up resolving that to 10.9.8.7, which is allowed to flow through their NAT and translated into the correct public address on the other end.

If you know of another way of handling this, I'm definitely interested.

Regarding why we need multiple servers: originally we had our turn traffic on a port we chose--let's say 54321; some users claimed that was non-standard and insisted we use 3478 instead (debatable, but....okay).  We unfortunately have dependencies on both now.  I'm also totally interested if you know of a workaround here.

Justin Uberti

unread,
Nov 16, 2015, 1:51:33 AM11/16/15
to discuss-webrtc
On Sat, Nov 14, 2015 at 10:32 AM, Jeremy Noring <jno...@hirevue.com> wrote:

The reason I filed it on the Chrome tracker is this does appear fixed in the webrtc codebase; see third_party/libjingle/source/talk/app/webrtc/portallocatorfactory.cc.  However, Chrome doesn't use that; it uses content/renderer/p2p/port_allocator.cc; port_allocator.cc, which is missing a fix to tweak priority (see git commit 88853c77c292bbaeb93f79bfe1dee6f95f70b384; it gives each TURN server a different priority )

Regarding why we need relay candidates if we have a central server, the biggest reason for us is TURN servers resolve via DNS.  We have several customers who have networks that do DNAT; that is, everything internally resolves to an address that is different from the actual public address.  Example: https//use1-myserver.aws.com resolves to 50.40.30.20 on the open internet, but inside customer network X, it resolves to 10.9.8.7, which routes that request through some hideous NAT infrastructure that makes the translation 10.9.8.7 -> 50.40.30.20.

So for normal WebRTC negotiation, our server is going to hand back IP addresses that say "hey, you can reach me at 50.40.30.20", but inside that network, an attempt to establish a connection to that address will be summarily blocked.  But because TURN resolves by domain name, we can pass in https://use1-myserver.aws.com:someport, and it'll end up resolving that to 10.9.8.7, which is allowed to flow through their NAT and translated into the correct public address on the other end. 

If you know of another way of handling this, I'm definitely interested.

Have you considered sending back ICE candidates with DNS names? It's legal, although I am not sure we have implemented that yet. 

Regarding why we need multiple servers: originally we had our turn traffic on a port we chose--let's say 54321; some users claimed that was non-standard and insisted we use 3478 instead (debatable, but....okay).  We unfortunately have dependencies on both now.  I'm also totally interested if you know of a workaround here.

I didn't follow what you were saying here. 

On Friday, November 13, 2015 at 5:52:34 PM UTC-7, Justin Uberti wrote:
Interesting question. Can you file this in the webrtc tracker instead (as it's not chrome-specific)?

Why do you need relay candidates if you have a central server? Agree in your situation you can take a much simpler approach, e.g. ICE Lite.

On Fri, Nov 13, 2015 at 3:46 PM, Jeremy Noring <jno...@hirevue.com> wrote:
I just filed https://code.google.com/p/chromium/issues/detail?id=555790, and wanted to get some feedback on it.  I think this is the bug that's been giving me grief for a few weeks now (seems much more pronounced with Chrome 47).  Basically, if Chrome is supplied with multiple TURN servers, it will produce candidates with identical priority.

Questions:
  1. Given multiple TURN servers, how should Chrome be generating the priority of respective candidates?  According to RFC 5245 it's a party foul to generate candidates with duplicate priorities, although that same RFC explicitly states it doesn't have anything to say about the multiple TURN server situation.
  2. Is there a workaround to this that anyone can see?
  3. Bigger picture question: for people like us using an SFU who really, honestly do not care about ICE negotiation (we always relay through a central server, for numerous reasons I won't get into here, but it's essentially the only reasonable way to go about it), is there an easier way to do this?  I understand ICE is critical for p2p, but for us, it's basically nothing but a headache.  
Thanks in advance.

--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/discuss-webrtc/1db3fe04-0cfe-497c-b097-fb76da627722%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.

Jeremy Noring

unread,
Nov 16, 2015, 12:33:01 PM11/16/15
to discuss-webrtc
On Sunday, November 15, 2015 at 11:51:33 PM UTC-7, Justin Uberti wrote:

Have you considered sending back ICE candidates with DNS names? It's legal, although I am not sure we have implemented that yet. 

We have, but you are correct: the last time we looked, WebRTC did not support ICE candidates with DNS names.  It's possible this may have changed since I last looked.
 

Regarding why we need multiple servers: originally we had our turn traffic on a port we chose--let's say 54321; some users claimed that was non-standard and insisted we use 3478 instead (debatable, but....okay).  We unfortunately have dependencies on both now.  I'm also totally interested if you know of a workaround here.

I didn't follow what you were saying here. 

Sorry, let me try again.

A lot of our users are big enterprise, and the only way to make them work with WebRTC is to have them punch holes in their firewall; they generally insist on a destination port/address(es).  Example: they allow port 30000 to use1-blah.aws.com and use2-blah.aws.com.  We run a TURN server on port 30000 at those URIs.

However, some users complained that port 30000 was non-standard and insisted on using port 3478; we have our TURN server at those locations listen on both of those ports.  I believe this may also be part of the issue we're experiencing: same TURN server, multiple listening ports.

Peter Thatcher

unread,
Nov 16, 2015, 2:21:46 PM11/16/15
to discuss-webrtc


On Sunday, November 15, 2015 at 10:51:33 PM UTC-8, Justin Uberti wrote:


On Sat, Nov 14, 2015 at 10:32 AM, Jeremy Noring <jno...@hirevue.com> wrote:

The reason I filed it on the Chrome tracker is this does appear fixed in the webrtc codebase; see third_party/libjingle/source/talk/app/webrtc/portallocatorfactory.cc.  However, Chrome doesn't use that; it uses content/renderer/p2p/port_allocator.cc; port_allocator.cc, which is missing a fix to tweak priority (see git commit 88853c77c292bbaeb93f79bfe1dee6f95f70b384; it gives each TURN server a different priority )

Regarding why we need relay candidates if we have a central server, the biggest reason for us is TURN servers resolve via DNS.  We have several customers who have networks that do DNAT; that is, everything internally resolves to an address that is different from the actual public address.  Example: https//use1-myserver.aws.com resolves to 50.40.30.20 on the open internet, but inside customer network X, it resolves to 10.9.8.7, which routes that request through some hideous NAT infrastructure that makes the translation 10.9.8.7 -> 50.40.30.20.

So for normal WebRTC negotiation, our server is going to hand back IP addresses that say "hey, you can reach me at 50.40.30.20", but inside that network, an attempt to establish a connection to that address will be summarily blocked.  But because TURN resolves by domain name, we can pass in https://use1-myserver.aws.com:someport, and it'll end up resolving that to 10.9.8.7, which is allowed to flow through their NAT and translated into the correct public address on the other end. 

If you know of another way of handling this, I'm definitely interested.

Have you considered sending back ICE candidates with DNS names? It's legal, although I am not sure we have implemented that yet. 

There's a patch that adds support here:  


But it never landed and it's been stale for 9 months.

Jeremy Noring

unread,
Nov 17, 2015, 5:11:27 PM11/17/15
to discuss-webrtc
So I've done a bunch more debugging and packet capture, and although the duplicate priority is a definite bug, there is some other very serious bug going on, and I'm about 95% sure it's in Chrome 47 (and *not* in Chrome 46).  Also, we can reproduce with only a single TURN server; what happens in that instance is Chrome ends up sending data direct to licode, but the back-channel path ends up going through TURN.  Chrome rejects the backchannel.

Here's what's happening in STUN land:
  1. Early on, Chrome sends a binding request because we're using TURN.  That binding request gets turned into a peer reflexive candidate by libnice.
  2. Chrome ultimately picks some candidate direct to libnice
  3. *after* we've selected that, chrome sends a bind request to libnice--it's almost as though the TURN candidate in Chrome isn't being stopped or canceled.  libnice interprets this as chrome wanting to switch, and moves over to using that.
  4. At this point, chrome is sending direct to libnice, but libnice's return path is through TURN.
I'm not sure if you guys have tests that include TURN, but there is something deeply wrong with Chrome 47.  I'm happy to share packet captures take from our media server that show all of this in action.

Justin Uberti

unread,
Nov 17, 2015, 8:21:36 PM11/17/15
to discuss-webrtc
It's not clear this is incorrect behavior. Chrome does aggressive nomination, and will try to establish connections on both direct and relayed paths. 

The server should use the highest-priority candidate pair.

Jeremy Noring

unread,
Nov 18, 2015, 1:31:08 PM11/18/15
to discuss-webrtc
You're right--it's not clear.  

So I did more packet captures, and now I fully understand the issues I'm seeing--it is multiple bugs in Chrome's TURN server priority computation.  They are present in both 46 and 47, are directly related to use of a TURN server and are exacerbated by 47's new features to "continuous negotiation lite."
  1. Bug #1: for STUN connectivity checks that arrive *before* the remote party receives any candidates, these all have the same priority regardless of where they originated.  We see priority 1853824767 sent in all STUN connectivity checks, including those that arrive via a TURN server.  Why is this wrong?  The remote party cannot know that a packet originated from a TURN server, so it is up to the agent sending the STUN connectivity check to correctly communicate priority.  This causes the connection to fail when TURN establishes first and then a subsequent attempt to swap to a direct peer reflexive connection succeeds.
  2. Bug #2: when supplied with multiple TURN servers, chrome gives all of them identical priority.  This means the same bug exists because duplicate priority = race condition.
Attached is a packet capture of Chrome 47 attaching to libnice.  Chrome 47 is controlling, libnice is controlled.  Packet capture details: this was taken on the media server. 66.219.246.238 is the external address of Chrome 47, 10.8.4.143 is the internal address of Chrome 47, 52.0.208.176 is the external address of our media server, and 10.192.0.164 is internal address of our media server.  Our TURN server lives ON the media server, and is listening on ports 3478 and 19350.

Here is offer from Chrome to libnice:

v=0

o=- 20130234262413483 2 IN IP4 127.0.0.1

s=-

t=0 0

a=group:BUNDLE audio video

a=msid-semantic: WMS dXgXF9BkR7usdUzQvnDT1KfMcBNK5z3Fs4Ql

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 126

c=IN IP4 0.0.0.0

a=rtcp:9 IN IP4 0.0.0.0

a=ice-ufrag:2y/OcHzMnWO3y5l8

a=ice-pwd:q6ET9MYGvayDyVNHml1Bke/v

a=fingerprint:sha-256 6F:43:BA:23:6A:DF:5D:6B:A4:3E:19:09:F2:8B:06:91:7B:28:1E:09:D5:62:9D:06:2F:0B:4C:5A:FF:E6:31:9C

a=setup:actpass

a=mid:audio

a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level

a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time

a=sendrecv

a=rtcp-mux

a=rtpmap:111 opus/48000/2

a=fmtp:111 minptime=10; useinbandfec=1

a=rtpmap:103 ISAC/16000

a=rtpmap:104 ISAC/32000

a=rtpmap:9 G722/8000

a=rtpmap:0 PCMU/8000

a=rtpmap:8 PCMA/8000

a=rtpmap:106 CN/32000

a=rtpmap:105 CN/16000

a=rtpmap:13 CN/8000

a=rtpmap:126 telephone-event/8000

a=maxptime:60

a=ssrc:865206515 cname:4d5ZKIDpY8Vp4+3X

a=ssrc:865206515 msid:dXgXF9BkR7usdUzQvnDT1KfMcBNK5z3Fs4Ql a1f81d98-276b-4790-9ff2-1400474397d1

a=ssrc:865206515 mslabel:dXgXF9BkR7usdUzQvnDT1KfMcBNK5z3Fs4Ql

a=ssrc:865206515 label:a1f81d98-276b-4790-9ff2-1400474397d1

m=video 9 UDP/TLS/RTP/SAVPF 100 116 117 96

b=AS:300

c=IN IP4 0.0.0.0

a=rtcp:9 IN IP4 0.0.0.0

a=ice-ufrag:2y/OcHzMnWO3y5l8

a=ice-pwd:q6ET9MYGvayDyVNHml1Bke/v

a=fingerprint:sha-256 6F:43:BA:23:6A:DF:5D:6B:A4:3E:19:09:F2:8B:06:91:7B:28:1E:09:D5:62:9D:06:2F:0B:4C:5A:FF:E6:31:9C

a=setup:actpass

a=mid:video

a=extmap:2 urn:ietf:params:rtp-hdrext:toffset

a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time

a=extmap:4 urn:3gpp:video-orientation

a=sendrecv

a=rtcp-mux

a=rtpmap:100 VP8/90000

a=rtcp-fb:100 ccm fir

a=rtcp-fb:100 nack

a=rtcp-fb:100 nack pli

a=rtcp-fb:100 goog-remb

a=rtpmap:116 red/90000

a=rtpmap:117 ulpfec/90000

a=rtpmap:96 rtx/90000

a=fmtp:96 apt=100

a=ssrc-group:FID 2991887344 125788295

a=ssrc:2991887344 cname:4d5ZKIDpY8Vp4+3X

a=ssrc:2991887344 msid:dXgXF9BkR7usdUzQvnDT1KfMcBNK5z3Fs4Ql 7aaa2ec5-ecbc-4ee4-a1f2-f742de19077b

a=ssrc:2991887344 mslabel:dXgXF9BkR7usdUzQvnDT1KfMcBNK5z3Fs4Ql

a=ssrc:2991887344 label:7aaa2ec5-ecbc-4ee4-a1f2-f742de19077b

a=ssrc:125788295 cname:4d5ZKIDpY8Vp4+3X

a=ssrc:125788295 msid:dXgXF9BkR7usdUzQvnDT1KfMcBNK5z3Fs4Ql 7aaa2ec5-ecbc-4ee4-a1f2-f742de19077b

a=ssrc:125788295 mslabel:dXgXF9BkR7usdUzQvnDT1KfMcBNK5z3Fs4Ql

a=ssrc:125788295 label:7aaa2ec5-ecbc-4ee4-a1f2-f742de19077b


...here is answer from libnice:

v=0

o=- 0 0 IN IP4 127.0.0.1

s=-

t=0 0

a=group:BUNDLE audio video

a=msid-semantic: WMS T8T1BYrnIM

m=audio 1 RTP/SAVPF 111

c=IN IP4 0.0.0.0

a=rtcp:1 IN IP4 0.0.0.0

a=ice-ufrag:lSxy

a=ice-pwd:k5u8IIADTc50HGvTBOwA9Q

a=fingerprint:sha-256 E0:A1:DB:84:28:D3:1D:4C:72:B1:BE:4D:54:37:C7:51:48:CB:C9:46:23:24:5C:7C:CE:16:FD:07:31:32:79:70

a=sendrecv

a=mid:audio

a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level

a=rtcp-mux

a=rtpmap:111 opus/48000/2

a=fmtp:111 minptime=10

a=candidate:1 1 udp 2013266431 10.192.0.164 19352 typ host generation 0

a=candidate:2 1 udp 1677721855 52.0.208.176 19352 typ srflx raddr 10.192.0.164 rport 19352 generation 0

m=video 1 RTP/SAVPF 100 116 117

c=IN IP4 0.0.0.0

a=rtcp:1 IN IP4 0.0.0.0

a=ice-ufrag:lSxy

a=ice-pwd:k5u8IIADTc50HGvTBOwA9Q

a=extmap:2 urn:ietf:params:rtp-hdrext:toffset

a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time

a=fingerprint:sha-256 E0:A1:DB:84:28:D3:1D:4C:72:B1:BE:4D:54:37:C7:51:48:CB:C9:46:23:24:5C:7C:CE:16:FD:07:31:32:79:70

a=sendrecv

a=mid:video

a=rtcp-mux

a=rtpmap:100 VP8/90000

a=rtcp-fb:100 ccm fir

a=rtcp-fb:100 nack

a=rtcp-fb:100 nack pli

a=rtcp-fb:100 goog-remb

a=rtpmap:116 red/90000

a=rtpmap:117 ulpfec/90000

a=candidate:1 1 udp 2013266431 10.192.0.164 19352 typ host generation 0

a=candidate:2 1 udp 1677721855 52.0.208.176 19352 typ srflx raddr 10.192.0.164 rport 19352 generation 0


...and here are the candidates that are trickled to libnice after OFFER was returned:


[ { sdpMLineIndex: 0,

       sdpMid: 'audio',

       candidate: 'a=candidate:2879613271 1 udp 2122260223 10.8.4.143 51741 typ host generation 0' },

     { sdpMLineIndex: 0,

       sdpMid: 'audio',

       candidate: 'a=candidate:1216540603 1 udp 1686052607 66.219.246.238 51741 typ srflx raddr 10.8.4.143 rport 51741 generation 0' },

     { sdpMLineIndex: 0,

       sdpMid: 'audio',

       candidate: 'a=candidate:3844117927 1 tcp 1518280447 10.8.4.143 0 typ host tcptype active generation 0' },

     { sdpMLineIndex: 0,

       sdpMid: 'audio',

       candidate: 'a=candidate:4012147109 1 udp 41885439 52.0.208.176 61006 typ relay raddr 66.219.246.238 rport 51741 generation 0' },

     { sdpMLineIndex: 0,

       sdpMid: 'audio',

       candidate: 'a=candidate:4012147109 1 udp 41885439 52.0.208.176 57815 typ relay raddr 66.219.246.238 rport 51741 generation 0' },

     { sdpMLineIndex: 0,

       sdpMid: 'audio',

       candidate: 'a=candidate:4012147109 1 udp 25108223 52.0.208.176 65238 typ relay raddr 66.219.246.238 rport 51693 generation 0' } ]


Here's the problem: see packets 27, 28, 29 and 30.  That is a STUN binding request sent from Chrome to port 3478 (so via TURN) with a priority of 1853824767, which is peer reflexive candidate priority.  It should be a relay candidate priority.  That causes libnice to incorrectly compute pair priorities because it simply assumes the remote end is sending the correct priority.  In packet 31 you can see an actual direct connection (to port 19352, which is the port libnice ends up listening on).  This results in a race condition between the TURN path and the direct path.


chrome47-use1-udp-only.pcap

Jeremy Noring

unread,
Nov 18, 2015, 4:35:12 PM11/18/15
to discuss-webrtc
I'm wrong.  *facepalm*

But the source of our issue is: running the TURN server on the same server as our media server, and having it be aware of its external IP address.  That causes a mismatch between the candidates chrome sends (it sends them with the external IP address, not internal), which causes those peer reflexive candidates received by libnice not to be updated with the correct priority..

I think these two things are still arguably bugs; technically #1 is per the standard, but it's nonsense ever to send a STUN request through TURN with peer reflexive priority.  #2 is incorrect per the spec.  

Thanks.

Christoffer Jansson

unread,
Nov 19, 2015, 3:22:44 AM11/19/15
to discuss-webrtc
Hi Jeremy,

Thorough investigation, I like it;).

Could you file bugs for #1 and #2?

For WebRTC standard issues, please file them at https://github.com/w3c/webrtc-pc and for WebRTC in Chrome issues https://code.google.com/p/chromium/issues/entry.

Thanks!

/Chris

Reply all
Reply to author
Forward
0 new messages