jitter tolerance of bbr. Is it needed?

372 views
Skip to first unread message

Tony James

unread,
Jul 11, 2017, 2:34:23 AM7/11/17
to BBR Development

Hi ~ before getting any start, I'm not good at English so please understand me making some mistakes.


   I've been working on improving BBR especially when there is jitter in network. I assumed this jitter is generated by some noise between a user device, say a cellphone, and a wireless router. 

As you may or may not know BBR is kind of awful when there's big jitter.


   In my experiment, I have three CentOS7 virtual machines in one window host. Each of them is a client, a server and a wan emulator, NETem. Using NETem I could set loss rate, delay and jitter. Having the client http-request by using 'wget' , I could measure the average tput on the client side. I repeated same experiment almost 10 times and used the average of those.

 

   When loss rate = 0%, delay = 100ms, bottleneck-bandwidth = 100Mbps with jitter over like 150ms , the performance of BBR was around 5Mbps although

CUBIC had almost around 20Mbps. So I made some change in BBR code to extend the tolerance of jitter and tested it. The result of it showed like 10Mbps improvement in tput

 against the original BBR in average when jitter is over 150ms without making changes in the performance that the original BBR had(it is true at least so far. see the figure below). I've tested it about 

other issues except for tput like RTT-fairness, convergence etc and tried to generalize this result for it to be deployed in real world.

 



   But I'm not sure about my assumption and practicality of my algorithm in real world. I'd like to take your advice for the questions below.


1) Are there any reported issues related to jitter in wireless network? If so, could I get to know what the issues are and what range of jitter it occurs in real world?

 

2) I'd like to know places where BBR can be improved especially when it's deployed in wireless network.

 

3) Do you think my try to extend the tolerance of jitter of BBR is meaningful and useful in real world? I really want to know it's worth continuing.

 

ValdikSS ValdikSS

unread,
Jul 11, 2017, 6:32:51 AM7/11/17
to BBR Development
On Tuesday, July 11, 2017 at 9:34:23 AM UTC+3, Tony James wrote:

Hi ~ before getting any start, I'm not good at English so please understand me making some mistakes.


   I've been working on improving BBR especially when there is jitter in network. I assumed this jitter is generated by some noise between a user device, say a cellphone, and a wireless router. 

As you may or may not know BBR is kind of awful when there's big jitter.


   In my experiment, I have three CentOS7 virtual machines in one window host. Each of them is a client, a server and a wan emulator, NETem. Using NETem I could set loss rate, delay and jitter. Having the client http-request by using 'wget' , I could measure the average tput on the client side. I repeated same experiment almost 10 times and used the average of those.

 

   When loss rate = 0%, delay = 100ms, bottleneck-bandwidth = 100Mbps with jitter over like 150ms , the performance of BBR was around 5Mbps although

CUBIC had almost around 20Mbps. So I made some change in BBR code to extend the tolerance of jitter and tested it. The result of it showed like 10Mbps improvement in tput

 against the original BBR in average when jitter is over 150ms without making changes in the performance that the original BBR had(it is true at least so far. see the figure below). I've tested it about 

other issues except for tput like RTT-fairness, convergence etc and tried to generalize this result for it to be deployed in real world.

 



   But I'm not sure about my assumption and practicality of my algorithm in real world. I'd like to take your advice for the questions below.


1) Are there any reported issues related to jitter in wireless network? If so, could I get to know what the issues are and what range of jitter it occurs in real world?


Yes, of course. Wireless drivers tend to buffer packets which introduces jitter. Wi-Fi network managers perform periodical active scanning with channel hopping, which gives you random 100+ms latency increase.
Also see http://blog.cerowrt.org/
 

Tony James

unread,
Jul 11, 2017, 7:25:20 AM7/11/17
to BBR Development


 



 

I forget before. tproxy_bbr on the graph above is the one that I made from the original bbr.


 
 

Neal Cardwell

unread,
Jul 11, 2017, 10:46:46 AM7/11/17
to Tony James, BBR Development
On Tue, Jul 11, 2017 at 2:34 AM, Tony James <a322...@gmail.com> wrote:

Hi ~ before getting any start, I'm not good at English so please understand me making some mistakes.


   I've been working on improving BBR especially when there is jitter in network. I assumed this jitter is generated by some noise between a user device, say a cellphone, and a wireless router. 

As you may or may not know BBR is kind of awful when there's big jitter.


   In my experiment, I have three CentOS7 virtual machines in one window host. Each of them is a client, a server and a wan emulator, NETem. Using NETem I could set loss rate, delay and jitter. Having the client http-request by using 'wget' , I could measure the average tput on the client side. I repeated same experiment almost 10 times and used the average of those.

 

   When loss rate = 0%, delay = 100ms, bottleneck-bandwidth = 100Mbps with jitter over like 150ms , the performance of BBR was around 5Mbps although

CUBIC had almost around 20Mbps. So I made some change in BBR code to extend the tolerance of jitter and tested it. The result of it showed like 10Mbps improvement in tput

 against the original BBR in average when jitter is over 150ms without making changes in the performance that the original BBR had(it is true at least so far. see the figure below). I've tested it about 

other issues except for tput like RTT-fairness, convergence etc and tried to generalize this result for it to be deployed in real world.

 



   But I'm not sure about my assumption and practicality of my algorithm in real world. I'd like to take your advice for the questions below.


Thanks for your report! The graph you posted is very interesting, and it would be interesting to know what sort of approach you used in your patches, and how it fared in terms of queueing delay packet loss, etc, particularly in scenarios with multiple flows sharing a bottleneck.

1) Are there any reported issues related to jitter in wireless network? If so, could I get to know what the issues are and what range of jitter it occurs in real world?


Yes, the issues related to BBR on wifi LANs are well-known, and have been discussed on the bbr-dev list a number of times. You can try a sample of the bbr-dev threads that mention wifi, by searching in the e-mail list archive:

We would have to look at the packet traces from your tests to be sure, but I suspect you are running into the known issues with the BBR cwnd computation being too conservative to reach full utilization for paths where the jitter is larger in magnitude than the two-way propagation delay of the path. This includes wifi LAN paths, due to the structure of the ACK stream in such links. Here is an example bbr-dev thread with some discussion, traces, and analysis:


The BBR team at Google is actively working on this.
 

2) I'd like to know places where BBR can be improved especially when it's deployed in wireless network.


We discussed this in the BBR presentation at IETF 98 in March, in slides 13-20:
  

3) Do you think my try to extend the tolerance of jitter of BBR is meaningful and useful in real world? I really want to know it's worth continuing.

Yes, it is definitely useful to work in this area.  There is ongoing work at Google in this area. In particular there are trade-offs, and the challenge is to try to allow extra data in flight for high-jitter paths in cases where it is needed, and not in other cases (in other cases, the extra data in flight just leads to extra queuing delay and packet loss).

cheers,
neal

Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted

Tony James

unread,
Jul 18, 2017, 3:51:18 AM7/18/17
to BBR Development


2017년 7월 11일 화요일 오후 7시 32분 51초 UTC+9, ValdikSS ValdikSS 님의 말:

1) Are there any reported issues related to jitter in wireless network? If so, could I get to know what the issues are and what range of jitter it occurs in real world?


Yes, of course. Wireless drivers tend to buffer packets which introduces jitter. Wi-Fi network managers perform periodical active scanning with channel hopping, which gives you random 100+ms latency increase.
Also see http://blog.cerowrt.org/
 

 Thank you for your answer~ and I'm sorry that I catch you up this late.

 



   
















 I made this environment to test my algorithm as well as bbr. And I noticed delay was really short between laptop and pc
(  I used ping to estimate that ). But it was around 100ms when it came to smart phone and laptop. Could you explain this to me?
For the case you're curious about ping data, I attacted it.


I entered into the web site, http://blog.cerowrt.org/, but I didn't find any issues related to jitter of wireless network. Did I miss it?? 
 If so, I would be really grateful if you give me some specific links. 
laptop.txt
phone.txt
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted

Tony James

unread,
Jul 18, 2017, 8:06:27 AM7/18/17
to BBR Development, a322...@gmail.com
Thank you for your answer~!! It was really helpful and I'm sorry to make a reply this late.

Here I wrote down from some basic concepts of my algorithm to details. And I tried to make sure it's clear and easy to understand using figures.
In addition to those I also attached my patch. 

1. Motivation

As I already said last time, I was gonna extend tolerance of jitter of BBR. Because I thought the reasons why BBR was not good at in WLAN were from big jitter and possibly low loss rate.

2. Basic Idea

1) to insert a module which is monitoring weather jitter happens or not in network and if it turns out there is, make bbr use previous information before jitter.

2) to have bbr more non-sensitive to RTT and expect it to have more tolerance of jitter. Hence to have more performance when there's jitter.

- This was suggested by the fact that the reason why bbr could have more tolerance of loss against cubic was because it didn't use loss as a clue or a parameter for resizing congestion window size. For other words, bbr isn't affected by loss and hence, it's non-sensitive to loss. I thought it could stay the same way when it comes to tolerance of jitter.

 

I almost use the combination version of 1) and 2). I'll explain all of details below.

 

3. specific explanation

1) scenario

I assume that there is not jitter between 0 of time and  A  and that there is after A for simplifying the situation.( I'll explain the reason later )







Let me explain the <figure 1> first. But before jumping in details I need to show my assumption first.

Assumption 1 >

There's no jitter between 0 of time and A. And there's some jitter between a wireless router and a cell phone because of noise after A.

Assumption 2 >

A bbr connection started at 0 of time and min_rtt was set at point because at B that point RTT was the lowest so far.

Under this circumstance the jitter that happens in yellow area will change min_rtt because there may be RTT sample lower than min_rtt. On the contrast the jitter that happens in green area will do nothing because it only has bigger ones than min_rtt. For more specific according to bbr ,

 

min_rtt = min( min_rtt , RTT sample )

 simply the jitter from the green area will be discarded. If we are trying to analyze it in terms of operating point view,


Discarding RTT sample from the green area is the same way as not allowing operating point of bbr to move  a->b  and this is done by bbr itself. Changing min_rtt to RTT sample from yellow area is the same way as moving operating point a->c . If  a->c , as you can see from <figure 2>, the amount in flight is decreasing. And hence reduce tput. This is obviously not what we want.


As a result, I wanna make my algorithm blocked in the transition  a->c . But in this case I have to set a rule, which says that what range of jitter I allow bbr to sensitively behave to this and not to. For more specific,





Like the <fugure 3-2> above when there's a given min_rtt, I have to decide "How do I decide A  point". For this, I assumed there are two concepts of jitter.(see below)




I guess if bottleneck queue is the only reason bbr has to follow, it's possible to get jitter sensitive range in the <figure 3-2> by calculating ( Bottleneck Queue Size / BtlBw).


 4. result







It is the same that I posted at my first question. The orange line is the one that I made. I set delay = 100 ms, loss = 0%, pacing enabled and also have bbr drop RTT sample if it is less than ( min_rtt - 100ms ).

 

* One thing I want to mention is that the bigger loss rate is, the lower the algorithm of mine shows improvement. I guess this is related to transient process routin bbr has.



5. patch




That's it.


6. Additional Explanation

You probably realized from 4. result that cubic has consistent behavior against jitter increase. But bbr is very not consistent. It shows a lot of variation. I guess the reason why bbr does is that it got effected a lot by at which state which value of jitter happens. And this is why I assumed a network with when there's jitter and when there is not based on  a.


7. Additional Experiment

Like I showed you guys at 4. with result, when I use my algorithm in NETem emulation environment, the improvement of tput was noticeable in range of jitter >= 150ms. But I wanted to know if it's working in real world. So I set up an environment like below.



At first, I used the laptop as a client and the pc as a server. When I ping between them, I noticed delay was almost around 10ms. According to the features of my algorithm, it looked like to me that it's hard to show the improvement under low delay network.

 

At second, I used the cell phone as a client and the laptop as a server. when I ping, delay was almost around 100ms. So I thoght this environment was good to use. But I'm kind of having a hard time with this because I have no idea on how to use cell phone for an experiment.

 

Any idea or comments related to my algorithm are really grateful~

JR Lin

unread,
Jul 18, 2017, 11:08:10 PM7/18/17
to BBR Development
The phenomena you captured maybe duet to the power saving mechanism of the smart phone. For example, if there is no traffic for about 200ms, the wlan of the smart phone will go to sleep. You can ping the smart phone more frequently(period less than 200ms for example), and check if the jitter is still so high.

在 2017年7月18日星期二 UTC+8下午3:51:18,Tony James写道:

Tony James

unread,
Jul 19, 2017, 12:59:44 AM7/19/17
to BBR Development


2017년 7월 19일 수요일 오후 12시 8분 10초 UTC+9, JR Lin 님의 말:
The phenomena you captured maybe duet to the power saving mechanism of the smart phone. For example, if there is no traffic for about 200ms, the wlan of the smart phone will go to sleep. You can ping the smart phone more frequently(period less than 200ms for example), and check if the jitter is still so high.

Thank you for your answer~ that's really interesting and helpful actually. thanks again. 

I tested with "ping -i 0.01 [ IP ]" and got the result exactly like you said. It's almost around 10ms like when I was doing with laptop. 

Hmm... so.. I guess it's quite right for me to conclude the general and normal WLAN delay  is almost around 10ms and hence the general jitter pattern is.
 



 

Tony James

unread,
Jul 19, 2017, 1:06:36 AM7/19/17
to BBR Development

and this is the result file from "ping -i 0.01 [IP]".  
short_interval.txt

Tony James

unread,
Jul 20, 2017, 7:47:36 PM7/20/17
to BBR Development, a322...@gmail.com


2017년 7월 11일 화요일 오후 11시 46분 46초 UTC+9, Neal Cardwell 님의 말:
 

Thanks for your report! The graph you posted is very interesting, and it would be interesting to know what sort of approach you used in your patches, and how it fared in terms of queueing delay packet loss, etc, particularly in scenarios with multiple flows sharing a bottleneck.


Hi neal~! I posted my patch and what kind approach I used. It would be really nice, if you make some comments or advices for that.

Thank you. 

tony.  
Reply all
Reply to author
Forward
0 new messages