Why my webrtc app does not connect two peers using "srflx" addresses over the internet (with only STUN server).

375 views
Skip to first unread message

Eliomar Conde

unread,
Aug 12, 2020, 12:30:02 AM8/12/20
to discuss-webrtc
Hey people my name is Eliomar.
I am new on webrtc and I am trying to develop a simple webRTC tool that could work over the internet.

I am trying to launch my server on internet with just STUN server (can I do that, right? or the TURN server is mandatory? even if I do not need for most of connections?).

I am running two peers on internet and trying to establish a webRTC between each other using my signaling server, which only has STUN information and not TURN information.

The problem is the following: 
SOMETIMES IT GETS CONNECTED, AND SOMETIMES NOT (most part of the time it does not get connected).
So by the times that it gets connected I assume that the TURN server is not necessary. But I would like to understand why that is happening. I'll explain better the situation:

In my application, both peers provide host (many candidates) and srflx candidates, for example:

candidate:953528780 1 tcp 1518280447 10.81.41.1 9 typ host tcptype active generation 0 network-id 1 network-cost 900
candidate:1985367356 1 udp 2122260223 10.81.41.1 49058 typ host generation 0 network-id 1 network-cost 900
candidate:2507656144 1 udp 1686052607 147.241.212.105 8546 typ srflx raddr 10.81.41.1 rport 49058 generation 0 network-id 1 network-cost 900

"When both parties provide a srflx packet, it means both parties should be connectable via a STUN-only setup but does not mean both parties can connect to each other with such.
It is likely that they can, however."

Can someone tell me, when that's not possible?
What may be happening? I also checked some data from failed session (no connection) and a successful session.
THE CANDIDATES WHERE EXACTLY THE SAME

does that make any sense?
I also checked the webrtc-internals page and for the failed session I got the following outputs for iceConnectionStateChange and connectionStateChange  :

Screenshot from 2020-08-12 01-17-06.png

And the successful session: 
Screenshot from 2020-08-12 01-20-22.png

Another difference between a failed session and a successful session was the currentLocalDescription property of one of the peers. I was checking and in a failed session the currentLocalDescription  had no candidates. While the successful session had host and srflx candidates. 
e.g.:
↵a=candidate:953528780 1 tcp 1518280447 10.81.41.1 9 typ host tcptype active generation 0 network-id 1 network-cost 900
↵a=candidate:1985367356 1 udp 2122260223 10.81.41.1 49058 typ host generation 0 network-id 1 network-cost 900
↵a=candidate:2507656144 1 udp 1686052607 179.241.212.105 8546 typ srflx raddr 10.81.41.1 rport 49058 generation 0 network-id 1 network-cost 900

the localDescription property should have also the local candidates that I sent to the otehr peer? If so, it is not automatic? Maybe my signaling process is wrong? what should be happening in order to get a reliable STUN srflx connection?

I appreciate all the possible help. Please you can also refer me to other documents that you think could contain the answer.
Thanks in advance to everyone.

Regards.
Eliomar.


Eliomar Conde

unread,
Aug 12, 2020, 12:44:47 AM8/12/20
to discuss-webrtc
Btw, I have to say that the webRTC app works locally very well and without problems. 
I also ran this test on my PC, https://test.webrtc.org/.
And I obtained the result: [ WARN ] Could not connect using reflexive candidates, likely due to the network environment/configuration.

But then I checked in this website https://support.google.com/googlenest/thread/42719875?hl=en And it says that the output is ok because the test has some problems. 
Therefore it should work well for peers that are not in the same network (my case). Which makes sense for the times it gets connected on my app.

Philipp Hancke

unread,
Aug 12, 2020, 3:09:05 AM8/12/20
to discuss...@googlegroups.com
Am Mi., 12. Aug. 2020 um 06:30 Uhr schrieb Eliomar Conde <elioc...@gmail.com>:
Hey people my name is Eliomar.
I am new on webrtc and I am trying to develop a simple webRTC tool that could work over the internet.

I am trying to launch my server on internet with just STUN server (can I do that, right? or the TURN server is mandatory? even if I do not need for most of connections?).

It is not mandated. However, things just don't work when not using one which is the actual reason people lots of people spend a lot of money on running them.

I am running two peers on internet and trying to establish a webRTC between each other using my signaling server, which only has STUN information and not TURN information.

The problem is the following: 
SOMETIMES IT GETS CONNECTED, AND SOMETIMES NOT (most part of the time it does not get connected).
So by the times that it gets connected I assume that the TURN server is not necessary. But I would like to understand why that is happening. I'll explain better the situation:

In my application, both peers provide host (many candidates) and srflx candidates, for example:

candidate:953528780 1 tcp 1518280447 10.81.41.1 9 typ host tcptype active generation 0 network-id 1 network-cost 900
candidate:1985367356 1 udp 2122260223 10.81.41.1 49058 typ host generation 0 network-id 1 network-cost 900
candidate:2507656144 1 udp 1686052607 147.241.212.105 8546 typ srflx raddr 10.81.41.1 rport 49058 generation 0 network-id 1 network-cost 900

"When both parties provide a srflx packet, it means both parties should be connectable via a STUN-only setup but does not mean both parties can connect to each other with such.
It is likely that they can, however."

*Likely*, not guaranteed.
 
Can someone tell me, when that's not possible?

Symmetric NAT, the usual hole punching techniques simply don't work there.
 
What may be happening? I also checked some data from failed session (no connection) and a successful session.
THE CANDIDATES WHERE EXACTLY THE SAME

exactly the same is unlikely since the ports are randomly bound, no?


does that make any sense?
I also checked the webrtc-internals page and for the failed session I got the following outputs for iceConnectionStateChange and connectionStateChange  :

Screenshot from 2020-08-12 01-17-06.png

And the successful session: 
Screenshot from 2020-08-12 01-20-22.png

Is this between the *same* pair of machines and the same infrastructure, in particular the nat routers in between?
If it is inconsistent between the same pair of machine that might indicate a bug in your signalling as it should be fairly deterministic.
There are bugs like https://bugs.chromium.org/p/webrtc/issues/detail?id=5813& but their impact is hard to quantify beyond "fairly low probability".

Another difference between a failed session and a successful session was the currentLocalDescription property of one of the peers. I was checking and in a failed session the currentLocalDescription  had no candidates. While the successful session had host and srflx candidates. 
e.g.:
↵a=candidate:953528780 1 tcp 1518280447 10.81.41.1 9 typ host tcptype active generation 0 network-id 1 network-cost 900
↵a=candidate:1985367356 1 udp 2122260223 10.81.41.1 49058 typ host generation 0 network-id 1 network-cost 900
↵a=candidate:2507656144 1 udp 1686052607 179.241.212.105 8546 typ srflx raddr 10.81.41.1 rport 49058 generation 0 network-id 1 network-cost 900

the localDescription property should have also the local candidates that I sent to the otehr peer? If so, it is not automatic?

In general pc.localDescription should contain all candidates that have been signalled via onicecandidate. If you see a difference between .localDescription and .currentLocalDescription file a bug.
 
Maybe my signaling process is wrong? what should be happening in order to get a reliable STUN srflx connection?

The general consensus is that this isn't possible without a TURN server and paying the cost for it.
 
I appreciate all the possible help. Please you can also refer me to other documents that you think could contain the answer.


Thanks in advance to everyone.

sorry if this turned into a bit of a rant :-)

Regards.
Eliomar.


--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/discuss-webrtc/bd5eac43-13e3-445f-aa1f-8a648cf86761n%40googlegroups.com.

Eliomar Conde

unread,
Aug 12, 2020, 11:49:40 AM8/12/20
to discuss-webrtc
Hey Phillip. Thanks for taking the time to answer my questions, I really appreciate it =).

It is not mandated. However, things just don't work when not using one which is the actual reason people lots of people spend a lot of money on running them.
 
So, the TURN server is not mandatory, but the things do not work well without it? Would you suggest me to add a TURN server? where can I get a TURN server service? Would you recommend me some solution?


"When both parties provide a srflx packet, it means both parties should be connectable via a STUN-only setup but does not mean both parties can connect to each other with such.
It is likely that they can, however."

*Likely*, not guaranteed.
 
Can someone tell me, when that's not possible?

Symmetric NAT, the usual hole punching techniques simply don't work there.

Ahmmm ok ok. I understand now, well this is not my cae right now.
 
 
What may be happening? I also checked some data from failed session (no connection) and a successful session.
THE CANDIDATES WHERE EXACTLY THE SAME

exactly the same is unlikely since the ports are randomly bound, no?

Yes, you are right, they are not exactly the same, the ports are different, and sometimes it repeats the candidates with different values of the variables "sdpMid" and "sdpMidLineIndex".
What means these values?
 
Is this between the *same* pair of machines and the same infrastructure, in particular the nat routers in between?
If it is inconsistent between the same pair of machine that might indicate a bug in your signalling as it should be fairly deterministic.
There are bugs like https://bugs.chromium.org/p/webrtc/issues/detail?id=5813& but their impact is hard to quantify beyond "fairly low probability".

Yes, I am testing between my PC (wifi internet) and my phone 4G Lte network (not the same provider of the wifi internet). Sometimes it works and sometimes not.
I have also tested with other wifi internet providers in computers, but the output is the same, sometimes it connects and sometimes not. 
Maybe is the order of my signaling process? I will check that. I understand that I may be having problems with some asynchronous tasks, right? it's because I based my signaling process on the google codelab of this link. https://codelabs.developers.google.com/codelabs/webrtc-web
Modificting it of course. 
But ok, I will check my signaling process, it's just I think the signaling process is a quite simple thing right? it's just signaling one message from one peer to another right? for that reason I think it may be a problem on the order?

What I do is the following:
1) Send offer
2) start gathering candidates and sending (As I understand that the gathering candidates process starts automatically right after calling the method .setLocalDescription()
3) Simultaneously in the second peer receiving the offer and sending the answer (before start receiving candidates from the first peer, but I am not 100% sure of that because that is asynchronous, right?)
4)sending candidates from the 1st peer to the second. 
5) and the the exchange of candidates

There is anything wrong, or am I missing something?
 
In general pc.localDescription should contain all candidates that have been signalled via onicecandidate. If you see a difference between .localDescription and .currentLocalDescription file a bug.

Ok ok, so in each call of the event handler "onicecandidate" those candidates should be added to the localDescription file? hmmm I will check this phillip. 
I just saw  .currentLocalDescription and not the .localDescription, but I will compare them and share the results here ok?
 
Maybe my signaling process is wrong? what should be happening in order to get a reliable STUN srflx connection?

The general consensus is that this isn't possible without a TURN server and paying the cost for it.

Ok, so I should get one TURN server or my app may not work properly? right? 
But, why the sometimes works and sometimes not? Lucky?
  
I appreciate all the possible help. Please you can also refer me to other documents that you think could contain the answer.


Ok ok I will check this information. 

Thanks in advance to everyone.

sorry if this turned into a bit of a rant :-)

Not at all Phillip, THANKS A LOT. I will be checking what you say in your comments and checking the information that you gave me.

Once again. Thanks very much for your help. I hope to have this working soon. or at least to learn along the way hahaha. =)

Regards.
Eliomar
 

Suman Cherukuri

unread,
Aug 12, 2020, 12:52:59 PM8/12/20
to discuss-webrtc
I am also running into the exact same issue. I am developing a native iOS app with native webrtcd library. Everything works great on wifi even when the peers are not on the same wifi network. But when I use my sprint network on my phone and wifi on another phone, it works sometimes but not all the time.

I will follow the suggestions below and see if my signaling and ice candidate exchange need to be adjusted.

Thanks,

--Suman

Eliomar Conde

unread,
Aug 12, 2020, 3:25:17 PM8/12/20
to discuss-webrtc
Hey Suman! ahhh ok ok I understand, I hope you can solve that with the things we have spoken here, that will be a great new. 
Could you share your results here? 

Suman Cherukuri

unread,
Aug 12, 2020, 6:45:04 PM8/12/20
to discuss-webrtc
I believe, I solved the issue in my system. I am not sure if it is the same issue you are having, but the symptoms are the same. Here is what's happening in my app;

In both peers, I maintain the ice candidates in a message queue. I drain the queue when remote SDP is set. When a peer receives an offer, I send an answer, set the remote peer, and drain the message to add the ice candidates to the peer connection. Everything works as long as the connections are fast enough (in the same wi-fi network os two different wi-fi networks. 

But, creating and sending the answer, and setting the remote description happens asynchronously. So, on a slower network, my code was adding the ice candidates to the peer connection before sending the answer and setting the remote description. Once I made sure that the ice candidates added on both sides only after the remote descriptions are set, I am getting 100% success when one device is on sprint network and the other is on a wi-fi network.

However, in this env, I am getting the connection through only a turn server. They are not getting P2P connection through STUN. So, somehow Sprint networks is not allowing UDP traffic or something like that. I will have to figure that out. Since, my app is going to be free for a while, I need to make sure, I don't incur too much cost on my TURN server

Hope it helps,

--Suman

Eliomar Conde

unread,
Aug 13, 2020, 12:50:41 AM8/13/20
to discuss-webrtc
Suman, I think that may be my problem too, but at the end of the candidate gathering I am checking sdp description either local and remote, and both have all the gathered and sent candidates.

How can you see which candidate is using webRTC to communicate? Sorry, I think I don't know how to see that. because in the SDP remote or local we just see all the gathered candidates. is there any way to see which is the candidate that webRTC is using to establish to transfer the media/datachannel data?

Eliomar.

Harald Alvestrand

unread,
Aug 13, 2020, 2:57:02 AM8/13/20
to discuss...@googlegroups.com
AddIceCandidate returns a promise.
You should always check whether that promise is rejected or not (such as by writing "await AddIceCandidate()".

We see that in the wild, a certain percentage (less than 1%) of candidates are rejected because local or remote SDP are not set; it is likely that this happens because the app writers haven't made sufficiently sure that the ICE candidate is only added after local and remote SDP are set.



--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.

Suman Cherukuri

unread,
Aug 13, 2020, 1:13:39 PM8/13/20
to discuss-webrtc
Hi Eliomar, I am not sure which candidate is used. Generally it goes by the network cost. But out of a;; the candidate with the same cost, I don't know which gets picked. I will see if I can find out debugging webrtc code.

--Suman

Eliomar Conde

unread,
Aug 13, 2020, 3:03:23 PM8/13/20
to discuss-webrtc
Harald, thanks for taking the time to answer us. 

On Thursday, August 13, 2020 at 3:57:02 AM UTC-3 Harald Alvestrand wrote:
AddIceCandidate returns a promise.
You should always check whether that promise is rejected or not (such as by writing "await AddIceCandidate()".

We see that in the wild, a certain percentage (less than 1%) of candidates are rejected because local or remote SDP are not set; it is likely that this happens because the app writers haven't made sufficiently sure that the ICE candidate is only added after local and remote SDP are set.

Ahhhh now I see. So I will make sure that I am adding candidates only when after I set the local and remote SDP. 
Great tip.

 

Suman Cherukuri

unread,
Aug 15, 2020, 3:32:40 PM8/15/20
to discuss-webrtc
When my phone is on Sprint network (Symmetric NAT) and my other device, iPad, is on my wi-fi, why am I not getting STUN connection? It is always relying on a TURN server. My understanding is that not all symmetric NATs require a TURN server. TURN is only necessary if one side of the connection is symmetric and the other side is either symmetric or port restricted cone.

I am using native webrtc libraries in my iOS app.

Thanks in advance for any suggestions,

--Suman

Suman Cherukuri

unread,
Aug 16, 2020, 2:53:53 PM8/16/20
to discuss-webrtc
Debugging through the native code, STUN ping is failing on the side on symmetric NAT. However, if I ping the host from a terminal the remote host, it succeeds.

I am a bit puzzled on why ping fails in webrtc native code while it succeeds in a terminal.

--Suman

Sean DuBois

unread,
Aug 16, 2020, 5:39:23 PM8/16/20
to discuss...@googlegroups.com
Hi Suman,

Things are a little more nuanced then this thread suggests. Sorry this
might not give you the answer you are looking for, but I think the info
is good long term.

I would avoid using the term 'Symmetric NAT'. There are a few different
attributes, and this doesn't do a great job of describing them. You can
read about them all in https://tools.ietf.org/html/rfc4787

The big things that matter is
* Mapping reuse behavior (Do you get a new mapping for every request?)
* Filtering (What remote hosts can use your mapping?)
* Refresh (How long does your mapping last)

When working with developers that use Pion I have heard of lots of weird
NAT behaviors as well.

* LTE Modem would change NAT Types depending on location?
* One ISP would only give you N NAT Mappings, and you would get a
more restrictive Mapping/Filtering if you created too many.

Here are some other things that might be worth checking out
* https://groups.google.com/forum/#!msg/discuss-webrtc/t7xfb8jHcsM/_YmTXMsMCAAJ
* https://github.com/pion/stun/tree/master/cmd/stun-nat-behaviour

I am also working on a Open Source book to try and make this stuff
easier to learn. https://webrtcforthecurious.com/docs/03-connecting/
> --
>
> ---
> You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/discuss-webrtc/6f81f17f-97ff-4f36-bc5c-7ccf954ad052o%40googlegroups.com.

Suman Cherukuri

unread,
Aug 17, 2020, 7:39:23 PM8/17/20
to discuss-webrtc
Thank Sean, I did go through your book in progress. Thank you. 

When you say mapping, did you mean port numbers for the srflx host? Yes, I get a different port and the same host IP, every time I ask the STUN server. I added two STUN servers, I get two different ports. 

I am not sure about filtering and refresh. Can you please tell me on how to find them?

I am still puzzled on why the device on cellular (that gets different ports from different STUN calls) cannot ping and bind to the remote device that has constant port.

Thanks,

--Suman
> To unsubscribe from this group and stop receiving emails from it, send an email to discuss...@googlegroups.com.

Eliomar Conde

unread,
Aug 18, 2020, 1:16:56 AM8/18/20
to discuss-webrtc
Ok After several tries I did noticed that in order to establish communication between my wifi PC and my phone with 4G LTE network I need a TURN server because apparently I cannot reach my phone p2p from my pc.
But here is the curious thing, I tested it with a friend and with him I am available to establish a good p2p connection without using TURN server but in that case the candidate is not srflx, but instean it is peerreflexive, So what does it means?.

Why is that happening? Why my friend is able to reach my phone on 4G  LTE wtih a p2p link (without using TURN) and I am not able? 
Of course that if I try to connect to my friend from my wifi PC it connects successfully. So, should I be able to reach my phone without using a TURN?
Can I say that my phone with 4G LTE is in a "Symmetric NAT". Or according to what Sean said, it is a Symmetric NAT?

I guess so, because in the sdp we can see the srflx candidate is the following

a=candidate:1111403267 1 udp 1686052607 177.58.248.21 4263 typ srflx raddr 100.71.108.148 rport 42855 generation 0 network-id 1 network-cost 900

and here we can see that the ports are not the same (maybe a symmetric nat behavior according to the documentation that Phillip sent from kurento https://doc-kurento.readthedocs.io/en/6.14.0/knowledge/nat.html)

and if we compare that candidate with the srflx from my wifi pc we can see that the public IP and the otehr one share the same port (cone behavior)

a=candidate:3822829061 1 udp 1685921535 178.183.50.70 38514 typ srflx raddr 192.168.25.50 rport 38514 generation 0 network-id 1 network-cost 10

It is a normal behavior? should not be able to reach my phone on 4G LTE? why the port of my srflx candidate is 4263 and not a higher value?

When you say mapping, did you mean port numbers for the srflx host? Yes, I get a different port and the same host IP, every time I ask the STUN server. I added two STUN servers, I get two different ports. 

I also get the same behavior. but idk why my ports are in such a low range 4200-4300. any explanation for that?
 
Things are a little more nuanced then this thread suggests. Sorry this
might not give you the answer you are looking for, but I think the info
is good long term.

I would avoid using the term 'Symmetric NAT'. There are a few different
attributes, and this doesn't do a great job of describing them. You can
read about them all in https://tools.ietf.org/html/rfc4787

The big things that matter is
* Mapping reuse behavior (Do you get a new mapping for every request?)
* Filtering (What remote hosts can use your mapping?)
* Refresh (How long does your mapping last)

When working with developers that use Pion I have heard of lots of weird
NAT behaviors as well.

* LTE Modem would change NAT Types depending on location?
* One ISP would only give you N NAT Mappings, and you would get a
  more restrictive Mapping/Filtering if you created too many.

Here are some other things that might be worth checking out
* https://groups.google.com/forum/#!msg/discuss-webrtc/t7xfb8jHcsM/_YmTXMsMCAAJ
* https://github.com/pion/stun/tree/master/cmd/stun-nat-behaviour

I am also working on a Open Source book to try and make this stuff
easier to learn. https://webrtcforthecurious.com/docs/03-connecting/

Thanks for the very useful information Sean.  

Eliomar.


Reply all
Reply to author
Forward
0 new messages