Load balance and auto scaling for Turn server on AWS

5,577 views
Skip to first unread message

Sachin

unread,
Mar 25, 2013, 8:56:58 AM3/25/13
to discuss...@googlegroups.com
Hi All,

I have query regarding the deployment of TURN server on Amazon on Cloud.  On this forum, some mails mention about running TURN server on AWS with port forwarding enable.

I have following doubts regarding the TURN server on Cloud.

1. TURN sever will be behind NAT on EC2. Port forwarding is required in NAT.
    In one TURN connections two ports get used i.e. one for TURN server transport address and second for Relayed transport address.
    All Ports should be one to one mapped so that connection happen successfully.
    Is this understanding correct?

2. Now suppose, I want to enable auto scaling or load balancing. How this can be achieved?
    I am not able to understand how load balancer will work in this scenario.
    Is there any link or document on the internet talking about the same?
    I am not very familiar with concept of Load balancing and Scaling. Going through the information on internet.
    Can somebody point to links which will throw light on such scenario.

Thanks in Advance

Jonathan Ekwempu

unread,
Mar 25, 2013, 11:56:28 AM3/25/13
to discuss...@googlegroups.com
Hi Sachin,

I can give answer to question 2. AWS allows you to attach IP (EIP) addresses to your server. While specifying servers in wrbrtc, you could specify several IPs and ports in the server configuration. In that way you could scale your TURN server without actually using a load balancer (ELB).

Thanks,
Jonathan Ekwempu

silviu.cpp

unread,
Apr 2, 2013, 2:46:40 PM4/2/13
to discuss...@googlegroups.com
Hello,

I have tested the following scenario:

Deployed 3 turn servers in Amazon . One in Amazon Singapore , One in Amazon Ireland and One in Amazon USA.
My Location is Bucharest. The ping rtt to the following locations is:
  • Singapore - 338 ms
  • USA - 132 ms
  • Ireland - 78 ms

If I set in webrtc server config the servers in the following order : Singapore USA, Ireland will always connect the call to the Singapore one. Which is the worst for my location. Basically webrtc is trying to connect to the servers in the order you are providing them.

Is this the expected working behavior ? I thought somehow it can detect the best route  by measuring the delay using ping.

Kind regards,
Silviu

alen

unread,
Apr 2, 2013, 3:36:40 PM4/2/13
to discuss...@googlegroups.com
Silviu,

I'd be interested in what you find - I've looked into balancing load on TURN servers and will have to come up with a solution soon.

So far I have 3 TURN servers deployed just for failover - they are all listed in the iceServers param to peer connection. The 1st is allways chosen first. One of the simplest ways I was looking at is either bandwidth or user limit configuration on each TURN server, so that the once limit is reached, it just fails over to the next one. Another simple way is randomizing the list order in the WebRTC app.

If you are deploying several TURN servers with geographic proximity I would say ideally you would detect location and re-arrange them in the list (either in the WebRTC app or web server) to always have the closest one listed first, as you've probably figured out.

Regards,
Alen

Caragea Silviu

unread,
Apr 2, 2013, 3:53:50 PM4/2/13
to discuss-webrtc
In case you are not in Amazon and have your own servers you can use LVS (Ip tunneling or direct routing) to balance them. Theoretically it's working with UDP (I hadn't tested but I'm not expecting problems) but I tested only with TCP and it's very nice.

Only the packet's on internal address goes to LB . The one on external address goes from turn node to the other client.

And regarding "If you are deploying several TURN servers with geographic proximity I would say ideally you would detect location and re-arrange them in the list" yes I agree that this can be a solution. But probably sending some stun ping requests to them and ordering by reply time will be a better option.




--
 
---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Oleg Moskalenko

unread,
Apr 2, 2013, 4:27:23 PM4/2/13
to discuss...@googlegroups.com
If you are using clusters and/or load balancing and if you are not using RTCP allocation in TURN server then the deployment must be relatively straightforward.

But if RTCP is used ("even port" feature in the TURN) then there must be a special type of load balancing so that the network traffic from the same IP address must always end up in a particular TURN server. So, the "persistence" of the load balancer must be not limited to IP:port pairs, but to the whole IP addresses.

That may be an unusual configuration and it may be a problem. To overcome this problem, a special implementation of TURN server is required.

In our rfc5766-turn-server we have in our plans the clustering support for RTCP sessions, hopefully some time this spring it will be done.

Alen IIkov

unread,
Apr 2, 2013, 4:27:55 PM4/2/13
to discuss...@googlegroups.com
On Tue, Apr 2, 2013 at 12:53 PM, Caragea Silviu <silvi...@gmail.com> wrote:
> In case you are not in Amazon and have your own servers you can use LVS (Ip
> tunneling or direct routing) to balance them. Theoretically it's working
> with UDP (I hadn't tested but I'm not expecting problems) but I tested only
> with TCP and it's very nice.

Thanks for the idea! Wouldn't you expect to have problems with
non-session based stuff like UDP going everywhere with layer 4
switching? I was assuming I'd have to select a server at a higher
level, but maybe LVS has enough smarts to deal with that.


> Only the packet's on internal address goes to LB . The one on external
> address goes from turn node to the other client.
>
> And regarding "If you are deploying several TURN servers with geographic
> proximity I would say ideally you would detect location and re-arrange them
> in the list" yes I agree that this can be a solution. But probably sending
> some stun ping requests to them and ordering by reply time will be a better
> option.

Agreed, the more intelligent the choice the better.

Alen

Vikas

unread,
Apr 4, 2013, 2:57:51 AM4/4/13
to discuss-webrtc
Hi Silviu,

Thanks for the feedback, you can track it in issue 1578 (
https://code.google.com/p/webrtc/issues/detail?id=1578 )

/Vikas

On Apr 2, 11:46 am, "silviu.cpp" <silviu....@gmail.com> wrote:
> Hello,
>
> I have tested the following scenario:
>
> Deployed 3 turn servers in Amazon . One in Amazon Singapore , One in Amazon
> Ireland and One in Amazon USA.
> My Location is Bucharest. The ping rtt to the following locations is:
>
>    - Singapore - 338 ms
>    - USA - 132 ms
>    - Ireland - 78 ms

Warren McDonald

unread,
Apr 8, 2013, 9:51:28 PM4/8/13
to discuss...@googlegroups.com
Hi all,

We have just discovered a bug, or at least undesirable behaviour, with the handling of STUN and TURN server members in the rtc.SERVER array. It appears that array members are tried in sequence only until a TURN server is reached. When the first TURN server is reached in the list, requests continue to be sent to that server address even if the TURN server does not reply, and there are other TURN members still in the in the array that have not been tried yet. 

This has just been tested on Stable and current Canary, with same result.

We were trying to test the difference in behaviour between 2 TURN server versions on different hosts. We thought that simply downing all other configured STUN and TURN servers would leave the browser with only the currently running server. We planned to test on the first, then down that and bring up the new server on the other host, and then repeat tests. This approach, which, which would have also tested our redundancy configuration for TURN, failed miserably.

Net result - multiple STUN servers can give redundancy, but multiple TURN servers do not. Even worse, if you had TURN server too early in the list it would stop the following STUN servers being used.  

Cheers,

Warren

Warren McDonald

unread,
Apr 8, 2013, 11:10:00 PM4/8/13
to discuss...@googlegroups.com
I may have gone off half primed here, but the situation is still bad.

I have just done a few more tests and it seems that I can say the behaviour is the same for STUN. The first configured STUN server will continue to be tried if even if it does not respond. The confounding behaviour that caused me to come to my previous conclusion, is that a configured TURN server will override a STUN server. In this case Chrome skips even trying the STUN server and goes straight to the first configured TURN server, which will be the only server tried, regardless of its operational state.

So now my experiences show that there is no point in having any more than one STUN server in the array, and no point in having that, if a TURN server is listed as well. This seems to fly in the face of the intent of having an array of ICE servers. 

As you can see from my history of posts, TURN is a hot spot for me. My current project requires the highest connection success rate be achieved across a population. For this to succeed we need to go well past what can be achieved with just STUN.

My concern is that this areas seems to be a black art, rather than a set of clear expectations of system behaviour that are the subject of regular regression testing. Unless we can get past this stage we will only contribute to bad experiences which will have a negative impact on community acceptance of WebRTC as a potential solution for wide spread government or corporate services. 

The rfc5766 turnserver project has made many changes to accommodate Chrome behaviour that is outside of the relevant RFCs. The TURN TCP and TLS support has been pushed back from rel 25 to 28.  
Whilst not wanting to sound like an ungrateful community member, who is in receipt of what is very nice gift from Google, I am very concerned that this critical area is not getting the right level of attention and the project will suffer as a result.   
      

Warren    

Mallinath Bareddy

unread,
Apr 9, 2013, 1:20:57 AM4/9/13
to discuss...@googlegroups.com
On Mon, Apr 8, 2013 at 8:10 PM, Warren McDonald <warren....@gmail.com> wrote:
I may have gone off half primed here, but the situation is still bad.

I have just done a few more tests and it seems that I can say the behaviour is the same for STUN. The first configured STUN server will continue to be tried if even if it does not respond. The confounding behaviour that caused me to come to my previous conclusion, is that a configured TURN server will override a STUN server. In this case Chrome skips even trying the STUN server and goes straight to the first configured TURN server, which will be the only server tried, regardless of its operational state.

So now my experiences show that there is no point in having any more than one STUN server in the array, and no point in having that, if a TURN server is listed as well. This seems to fly in the face of the intent of having an array of ICE servers.
Your observations are correct, we just pick the top of the configured TURN server list and we will use it for STUN as well. 
 

As you can see from my history of posts, TURN is a hot spot for me. My current project requires the highest connection success rate be achieved across a population. For this to succeed we need to go well past what can be achieved with just STUN.

My concern is that this areas seems to be a black art, rather than a set of clear expectations of system behaviour that are the subject of regular regression testing. Unless we can get past this stage we will only contribute to bad experiences which will have a negative impact on community acceptance of WebRTC as a potential solution for wide spread government or corporate services. 

The rfc5766 turnserver project has made many changes to accommodate Chrome behaviour that is outside of the relevant RFCs. The TURN TCP and TLS support has been pushed back from rel 25 to 28.  
Whilst not wanting to sound like an ungrateful community member, who is in receipt of what is very nice gift from Google, I am very concerned that this critical area is not getting the right level of attention and the project will suffer as a result.   

We are working on adding TCP and TLS support in M28 and also using all TURN servers provided through configuration. Stay tuned.
      

Warren    

On Tuesday, 9 April 2013 11:51:28 UTC+10, Warren McDonald wrote:
Hi all,

We have just discovered a bug, or at least undesirable behaviour, with the handling of STUN and TURN server members in the rtc.SERVER array. It appears that array members are tried in sequence only until a TURN server is reached. When the first TURN server is reached in the list, requests continue to be sent to that server address even if the TURN server does not reply, and there are other TURN members still in the in the array that have not been tried yet. 

This has just been tested on Stable and current Canary, with same result.

We were trying to test the difference in behaviour between 2 TURN server versions on different hosts. We thought that simply downing all other configured STUN and TURN servers would leave the browser with only the currently running server. We planned to test on the first, then down that and bring up the new server on the other host, and then repeat tests. This approach, which, which would have also tested our redundancy configuration for TURN, failed miserably.

Net result - multiple STUN servers can give redundancy, but multiple TURN servers do not. Even worse, if you had TURN server too early in the list it would stop the following STUN servers being used.  

Cheers,

Warren

On Monday, 25 March 2013 23:56:58 UTC+11, Sachin wrote:
Hi All,

I have query regarding the deployment of TURN server on Amazon on Cloud.  On this forum, some mails mention about running TURN server on AWS with port forwarding enable.

I have following doubts regarding the TURN server on Cloud.

1. TURN sever will be behind NAT on EC2. Port forwarding is required in NAT.
    In one TURN connections two ports get used i.e. one for TURN server transport address and second for Relayed transport address.
    All Ports should be one to one mapped so that connection happen successfully.
    Is this understanding correct?

2. Now suppose, I want to enable auto scaling or load balancing. How this can be achieved?
    I am not able to understand how load balancer will work in this scenario.
    Is there any link or document on the internet talking about the same?
    I am not very familiar with concept of Load balancing and Scaling. Going through the information on internet.
    Can somebody point to links which will throw light on such scenario.

Thanks in Advance

Oleg Moskalenko

unread,
Apr 9, 2013, 3:36:39 AM4/9/13
to discuss...@googlegroups.com
Hi Warren

I hope that this problem will be fixed by WebRTC folks.

If not, you can think about a setup with single "front-end" IP and multiple TURN/STUN servers behind it (a sort of real cluster-like functionality). It will require some creative networking engineering and (may be) changes in the TURN/STUN servers. Although, it will not provide absolute networking failover protection - as any other solution with single IP.

I am sure a solution eventually will be found for your problem. Let me know if rfc5766-turn-server project can help somehow.

Regards,
Oleg

Justin Uberti

unread,
Apr 9, 2013, 7:16:14 PM4/9/13
to discuss-webrtc
This is the first time anybody has raised this issue as important to them. We have many, many things on the list for each milestone, and we can't do all of them. So we work on the ones that we think are the most important.

We did not feel this issue was important because it has a fairly easy workaround - in your web frontend, you can use geo information to pick the best TURN server and return its address.


Warren McDonald

unread,
Apr 11, 2013, 8:34:27 PM4/11/13
to discuss...@googlegroups.com
Hi Justin,

as I prefaced, my comments would probably sound ungrateful. I am not trying to say that Google are not delivering. I think the progress is astounding. I am frustrated at the lack of perceived clarity about the TURN specification and how they might/should be supported by browser implementations. By referring to this as a "black art" I am implying that not much open knowledge seems to exist in this domain. 

I certainly stand by my comments around importance to the success of the WebRTC project. I may be biased toward the more difficult end of the spectrum here, but that is just a reflection of the Health sector in which I work. Our work focuses on getting the technology most likely to support broad adoption across the community, but where one end of the call is almost always going to be in a tightly manged and locked down network environment. Our experience so far is that this scenario is far more dependent on TURN than we would have hoped. So the percentage of calls that will rely on TURN is significant, making us much more reliant on redundant deployment configurations.
        
I understand the opportunities for front end logic which could determine the best TURN server to use based on geography, response time etc. I did not expect to have to also include the failover redundancy provision. Of course a response time test would rule-out failed servers but we would then have to factor in wait times retries etc. As STUN and TURN are network infrastructure services, similar to DNS, many would expect an array of servers to be treated in a similar fashion by the application requiring the service. Others may see TURN as a proxy service and adopt the view that it should be as broken as browser support for a failed HTTP proxy.  

As usual for WebRTC conversations, we are searching for the right boundary between browser and web application implemented. As browsers are moving more to providing more OS like functions, for Chromebooks etc, I feel this basic redundancy function belongs in the browser.


Cheers,

Warren          

Warren McDonald

unread,
Apr 14, 2013, 10:38:18 PM4/14/13
to discuss...@googlegroups.com
Thanks Mallinath,

that's great news.

Warren

Justin Uberti

unread,
Apr 16, 2013, 6:15:00 PM4/16/13
to discuss-webrtc
On Thu, Apr 11, 2013 at 5:34 PM, Warren McDonald <warren....@gmail.com> wrote:
Hi Justin,

as I prefaced, my comments would probably sound ungrateful. I am not trying to say that Google are not delivering. I think the progress is astounding. I am frustrated at the lack of perceived clarity about the TURN specification and how they might/should be supported by browser implementations. By referring to this as a "black art" I am implying that not much open knowledge seems to exist in this domain. 

I certainly stand by my comments around importance to the success of the WebRTC project. I may be biased toward the more difficult end of the spectrum here, but that is just a reflection of the Health sector in which I work. Our work focuses on getting the technology most likely to support broad adoption across the community, but where one end of the call is almost always going to be in a tightly manged and locked down network environment. Our experience so far is that this scenario is far more dependent on TURN than we would have hoped. So the percentage of calls that will rely on TURN is significant, making us much more reliant on redundant deployment configurations.
        
I understand the opportunities for front end logic which could determine the best TURN server to use based on geography, response time etc. I did not expect to have to also include the failover redundancy provision. Of course a response time test would rule-out failed servers but we would then have to factor in wait times retries etc. As STUN and TURN are network infrastructure services, similar to DNS, many would expect an array of servers to be treated in a similar fashion by the application requiring the service. Others may see TURN as a proxy service and adopt the view that it should be as broken as browser support for a failed HTTP proxy.  

As usual for WebRTC conversations, we are searching for the right boundary between browser and web application implemented. As browsers are moving more to providing more OS like functions, for Chromebooks etc, I feel this basic redundancy function belongs in the browser.

We will support this behavior in the future. 

Justin Uberti

unread,
May 10, 2013, 5:22:30 PM5/10/13
to discuss-webrtc
This is now available in Canary.

Warren McDonald

unread,
May 10, 2013, 5:46:22 PM5/10/13
to discuss...@googlegroups.com

Thanks Justin.

We will setup a round of tests for this and the TCP support.

Warren

You received this message because you are subscribed to a topic in the Google Groups "discuss-webrtc" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/discuss-webrtc/HST7szgs_k0/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to discuss-webrt...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages