Hi ~ before getting any start, I'm not good at English so please understand me making some mistakes.
I've been working on improving BBR especially when there is jitter in network. I assumed this jitter is generated by some noise between a user device, say a cellphone, and a wireless router.
As you may or may not know BBR is kind of awful when there's big jitter.
In my experiment, I have three CentOS7 virtual machines in one window host. Each of them is a client, a server and a wan emulator, NETem. Using NETem I could set loss rate, delay and jitter. Having the client http-request by using 'wget' , I could measure the average tput on the client side. I repeated same experiment almost 10 times and used the average of those.
When loss rate = 0%, delay = 100ms, bottleneck-bandwidth = 100Mbps with jitter over like 150ms , the performance of BBR was around 5Mbps although
CUBIC had almost around 20Mbps. So I made some change in BBR code to extend the tolerance of jitter and tested it. The result of it showed like 10Mbps improvement in tput
against the original BBR in average when jitter is over 150ms without making changes in the performance that the original BBR had(it is true at least so far. see the figure below). I've tested it about
other issues except for tput like RTT-fairness, convergence etc and tried to generalize this result for it to be deployed in real world.
But I'm not sure about my assumption and practicality of my algorithm in real world. I'd like to take your advice for the questions below.
1) Are there any reported issues related to jitter in wireless network? If so, could I get to know what the issues are and what range of jitter it occurs in real world?
2) I'd like to know places where BBR can be improved especially when it's deployed in wireless network.
3) Do you think my try to extend the tolerance of jitter of BBR is meaningful and useful in real world? I really want to know it's worth continuing.
Hi ~ before getting any start, I'm not good at English so please understand me making some mistakes.
I've been working on improving BBR especially when there is jitter in network. I assumed this jitter is generated by some noise between a user device, say a cellphone, and a wireless router.
As you may or may not know BBR is kind of awful when there's big jitter.
In my experiment, I have three CentOS7 virtual machines in one window host. Each of them is a client, a server and a wan emulator, NETem. Using NETem I could set loss rate, delay and jitter. Having the client http-request by using 'wget' , I could measure the average tput on the client side. I repeated same experiment almost 10 times and used the average of those.
When loss rate = 0%, delay = 100ms, bottleneck-bandwidth = 100Mbps with jitter over like 150ms , the performance of BBR was around 5Mbps although
CUBIC had almost around 20Mbps. So I made some change in BBR code to extend the tolerance of jitter and tested it. The result of it showed like 10Mbps improvement in tput
against the original BBR in average when jitter is over 150ms without making changes in the performance that the original BBR had(it is true at least so far. see the figure below). I've tested it about
other issues except for tput like RTT-fairness, convergence etc and tried to generalize this result for it to be deployed in real world.
But I'm not sure about my assumption and practicality of my algorithm in real world. I'd like to take your advice for the questions below.
1) Are there any reported issues related to jitter in wireless network? If so, could I get to know what the issues are and what range of jitter it occurs in real world?
Hi ~ before getting any start, I'm not good at English so please understand me making some mistakes.
I've been working on improving BBR especially when there is jitter in network. I assumed this jitter is generated by some noise between a user device, say a cellphone, and a wireless router.
As you may or may not know BBR is kind of awful when there's big jitter.
In my experiment, I have three CentOS7 virtual machines in one window host. Each of them is a client, a server and a wan emulator, NETem. Using NETem I could set loss rate, delay and jitter. Having the client http-request by using 'wget' , I could measure the average tput on the client side. I repeated same experiment almost 10 times and used the average of those.
When loss rate = 0%, delay = 100ms, bottleneck-bandwidth = 100Mbps with jitter over like 150ms , the performance of BBR was around 5Mbps although
CUBIC had almost around 20Mbps. So I made some change in BBR code to extend the tolerance of jitter and tested it. The result of it showed like 10Mbps improvement in tput
against the original BBR in average when jitter is over 150ms without making changes in the performance that the original BBR had(it is true at least so far. see the figure below). I've tested it about
other issues except for tput like RTT-fairness, convergence etc and tried to generalize this result for it to be deployed in real world.
But I'm not sure about my assumption and practicality of my algorithm in real world. I'd like to take your advice for the questions below.
1) Are there any reported issues related to jitter in wireless network? If so, could I get to know what the issues are and what range of jitter it occurs in real world?
2) I'd like to know places where BBR can be improved especially when it's deployed in wireless network.
3) Do you think my try to extend the tolerance of jitter of BBR is meaningful and useful in real world? I really want to know it's worth continuing.
1) Are there any reported issues related to jitter in wireless network? If so, could I get to know what the issues are and what range of jitter it occurs in real world?
Yes, of course. Wireless drivers tend to buffer packets which introduces jitter. Wi-Fi network managers perform periodical active scanning with channel hopping, which gives you random 100+ms latency increase.
Also see http://blog.cerowrt.org/
1. Motivation
As I already said last time, I was gonna extend tolerance of jitter of BBR. Because I thought the reasons why BBR was not good at in WLAN were from big jitter and possibly low loss rate.
2. Basic Idea
1) to insert a module which is monitoring weather jitter happens or not in network and if it turns out there is, make bbr use previous information before jitter.
2) to have bbr more non-sensitive to RTT and expect it to have more tolerance of jitter. Hence to have more performance when there's jitter.
- This was suggested by the fact that the reason why bbr could have more tolerance of loss against cubic was because it didn't use loss as a clue or a parameter for resizing congestion window size. For other words, bbr isn't affected by loss and hence, it's non-sensitive to loss. I thought it could stay the same way when it comes to tolerance of jitter.
I almost use the combination version of 1) and 2). I'll explain all of details below.
3. specific explanation
1) scenario
I assume that there is not jitter between 0 of time and A and that there is after A for simplifying the situation.( I'll explain the reason later )
Let me explain the <figure 1> first. But before jumping in details I need to show my assumption first.
Assumption 1 >
There's no jitter between 0 of time and A. And there's some jitter between a wireless router and a cell phone because of noise after A.
Assumption 2 >
A bbr connection started at 0 of time and min_rtt was set at point because at B that point RTT was the lowest so far.
Under this circumstance the jitter that happens in yellow area will change min_rtt because there may be RTT sample lower than min_rtt. On the contrast the jitter that happens in green area will do nothing because it only has bigger ones than min_rtt. For more specific according to bbr ,
min_rtt = min( min_rtt , RTT sample )
simply the jitter from the green area will be discarded. If we are trying to analyze it in terms of operating point view,
Discarding RTT sample from the green area is the same way as not allowing operating point of bbr to move a->b and this is done by bbr itself. Changing min_rtt to RTT sample from yellow area is the same way as moving operating point a->c . If a->c , as you can see from <figure 2>, the amount in flight is decreasing. And hence reduce tput. This is obviously not what we want.
As a result, I wanna make my algorithm blocked in the transition a->c . But in this case I have to set a rule, which says that what range of jitter I allow bbr to sensitively behave to this and not to. For more specific,
I guess if bottleneck queue is the only reason bbr has to follow, it's possible to get jitter sensitive range in the <figure 3-2> by calculating ( Bottleneck Queue Size / BtlBw).
4. result
It is the same that I posted at my first question. The orange line is the one that I made. I set delay = 100 ms, loss = 0%, pacing enabled and also have bbr drop RTT sample if it is less than ( min_rtt - 100ms ).
* One thing I want to mention is that the bigger loss rate is, the lower the algorithm of mine shows improvement. I guess this is related to transient process routin bbr has.
5. patch
You probably realized from 4. result that cubic has consistent behavior against jitter increase. But bbr is very not consistent. It shows a lot of variation. I guess the reason why bbr does is that it got effected a lot by at which state which value of jitter happens. And this is why I assumed a network with when there's jitter and when there is not based on a.
7. Additional Experiment
Like I showed you guys at 4. with result, when I use my algorithm in NETem emulation environment, the improvement of tput was noticeable in range of jitter >= 150ms. But I wanted to know if it's working in real world. So I set up an environment like below.
At first, I used the laptop as a client and the pc as a server. When I ping between them, I noticed delay was almost around 10ms. According to the features of my algorithm, it looked like to me that it's hard to show the improvement under low delay network.
At second, I used the cell phone as a client and the laptop as a server. when I ping, delay was almost around 100ms. So I thoght this environment was good to use. But I'm kind of having a hard time with this because I have no idea on how to use cell phone for an experiment.
Any idea or comments related to my algorithm are really grateful~
The phenomena you captured maybe duet to the power saving mechanism of the smart phone. For example, if there is no traffic for about 200ms, the wlan of the smart phone will go to sleep. You can ping the smart phone more frequently(period less than 200ms for example), and check if the jitter is still so high.
Thanks for your report! The graph you posted is very interesting, and it would be interesting to know what sort of approach you used in your patches, and how it fared in terms of queueing delay packet loss, etc, particularly in scenarios with multiple flows sharing a bottleneck.