Standalone use/testing of BBR

Pavel P

unread,

Oct 20, 2017, 11:51:09 PM10/20/17

to BBR Development

Hi,

is it possible to test BBR by plugin in its logic in custom udp-based protocol that tries to deliver reliably data over UDP?

Currently, I use basic implementation of tcp congestion logic as described in this TCP tutorial: http://www.ssfnet.org/Exchange/tcp/tcpTutorialNotes.html

In short, I transfer a few megs reliably over UDP and because of the basic TCP logic that I have my transmission drops close to zero in case of packet losses and large RTTs (on mobile data networks). Seems like BBR was designed to address these issues.

That's the code that I'm working on: https://github.com/facebookarchive/RakNet/blob/master/Source/CCRakNetSlidingWindow.cpp

This code is a mess, but in short it does basic congestion window stuff, and fairly simple.

Thanks

Beyers Cronje

unread,

Oct 21, 2017, 12:56:34 AM10/21/17

to BBR Development

You should be able to do it. In fact that is what QUIC does. You will still need a seperate loss detection implementation like RACK and since BBR relies on pacing your packet send logic will need to implement pacing.

Neal Cardwell

unread,

Oct 21, 2017, 11:32:23 AM10/21/17

to Beyers Cronje, BBR Development

Indeed, it should be possible to plug BBR into almost any congestion control framework that can support a congestion window and a pacing rate.

neal

On Sat, Oct 21, 2017 at 12:56 AM, Beyers Cronje <bcr...@gmail.com> wrote:

You should be able to do it. In fact that is what QUIC does. You will still need a seperate loss detection implementation like RACK and since BBR relies on pacing your packet send logic will need to implement pacing.

--
You received this message because you are subscribed to the Google Groups "BBR Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Pavel P

unread,

Nov 14, 2017, 11:29:30 PM11/14/17

to BBR Development

In my project I ended up developing my own algorithm since it wouldn't be easy to extract BBR code from linux TCP of from quic (because of dependencies and because all of that would need to be stuffed into another library where I don't have total control on what I could do). In my algorithm the approach is to measure max BW and min RTT and try to adapt to these measurements and constantly measure these as link changes (my background is in VoIP mainly, I've done similar stuff years ago for video call when mobile internet was extremely slow). So, in general I think my approach is similar to what is done in BBR.

Does BBR handle properly cases like 6KB/s link with 10ms RTT (where a packet has to be sent every 250ms while RTT is just 10 ms)? Would it work properly? Would it be able to "sense" and adapt to a "whopping" 50% bandwidth increase to 9KB/s? My code works well with cases where there are multiple packets per RTT, but when it's not the case then it probably won't work well. How is this handled by BBR?

Pavel

On Saturday, October 21, 2017 at 8:32:23 AM UTC-7, Neal Cardwell wrote:

Indeed, it should be possible to plug BBR into almost any congestion control framework that can support a congestion window and a pacing rate.

neal

On Sat, Oct 21, 2017 at 12:56 AM, Beyers Cronje <bcr...@gmail.com> wrote:

You should be able to do it. In fact that is what QUIC does. You will still need a seperate loss detection implementation like RACK and since BBR relies on pacing your packet send logic will need to implement pacing.

--
You received this message because you are subscribed to the Google Groups "BBR Development" group.

To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.

Jonathan Morton

unread,

Nov 14, 2017, 11:35:31 PM11/14/17

to Pavel P, BBR Development

On 15 Nov 2017 06:29, "Pavel P" <pavlov...@gmail.com> wrote:

> Does BBR handle properly cases like 6KB/s link with 10ms RTT (where a packet has to be sent every 250ms while RTT is just 10 ms)?

Such a thing is impossible. If it takes 250ms to send each packet (serialisation delay), then the RTT must be at least that, and BBR will measure it as such.

- Jonathan Morton

Neal Cardwell

unread,

Nov 15, 2017, 12:33:17 PM11/15/17

to Jonathan Morton, Pavel P, BBR Development

From the one-packet-per-250ms time scale, assuming ~1500-byte MTU then I take it that 6KB/s is 6 Kilobytes per second (49Kbps). At that rate it does seem like a minimal-sized TCP/IPv4 packet could be serialized in a little under 10ms. In this kind of super-low-bandwidth case BBR would pick up the initial RTProp sample from the SYN/ACK handshake, and use that for the first RTprop filter window; after that it would switch to the measured RTT from the full-sized packets.

We routinely run tests of Linux TCP BBR down to 16Kbps, and it seems to work.

To check your 6KB/s case, I just ran tests of 1 and 4 Linux TCP BBR flows on a netem-emulated path with 6KB/s = 49Kbps and got results that are, IMHO, fine:

- BBR utilizes the pipe fully while keeping the queue bounded (no bufferbloat - e.g. with one flow the median RTT is 787ms)

- BBR achieves approximate fairness in the multi-flow case (bytes transferred per flow in a 3-minute test: {577753, 443089, 632777, 524177}; perfect fairness would be 6*1024 * 360 / 4 = 552960)

cheers,

neal

--

You received this message because you are subscribed to the Google Groups "BBR Development" group.

To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+unsubscribe@googlegroups.com.

Pavel P

unread,

Nov 15, 2017, 2:33:50 PM11/15/17

to BBR Development

It's very possible. 1000 concurrent connections on 10mbps link should result in 10kbps for each connection regardless of RTT.

Pavel P

unread,

Nov 15, 2017, 2:50:08 PM11/15/17

to BBR Development

In my implementation (of not BBR, but similar bandwidth and rtt estimation logic) I do calculations per block. A new block is allocated in a queue once I get an ACK for the last block in queue, which means that each block keeps stats for 1 RTT of packets. In case when delay between packets is higher than 1 RTT I end up with blocks each having 1 packet only. It kind of works, but not as precise compared to normal bitrate scenario.

I just reviewed that BBR paper published on acm.org and it seems that BBR calculates bitrate for each packet individually.

Regarding your measurements. You got 787ms median rtt, but the original network spec is 10ms rtt with 6KB/s (4 packets per sec) bandwidth. Shouldn't it detect and drain the RTT and try to pace packets so that rtt stays close to the min?

Pavel

To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.

Neal Cardwell

unread,

Nov 15, 2017, 3:45:45 PM11/15/17

to Pavel P, BBR Development

On Wed, Nov 15, 2017 at 2:50 PM, Pavel P <pavlov...@gmail.com> wrote:

In my implementation (of not BBR, but similar bandwidth and rtt estimation logic) I do calculations per block. A new block is allocated in a queue once I get an ACK for the last block in queue, which means that each block keeps stats for 1 RTT of packets. In case when delay between packets is higher than 1 RTT I end up with blocks each having 1 packet only. It kind of works, but not as precise compared to normal bitrate scenario.
I just reviewed that BBR paper published on acm.org and it seems that BBR calculates bitrate for each packet individually.

Yes, BBR calculates a delivery rate (bandwidth) sample on every ACK, measuring the average delivery rate over the time-scale of the last round-trip.

Regarding your measurements. You got 787ms median rtt, but the original network spec is 10ms rtt with 6KB/s (4 packets per sec) bandwidth. Shouldn't it detect and drain the RTT and try to pace packets so that rtt stays close to the min?

Not quite. The BBR control loop does not operate by attempting to keep the RTT close to the min RTT. In short, BBR operates by trying to estimate the available bandwidth, the minimum RTT, and maintain the amount of in-flight data around a level that it estimates will be able to achieve the available bandwidth. Please see the ACM Queue article for an overview of the BBR algorithm:

http://queue.acm.org/detail.cfm?id=3022184

In general, TCP needs at least four packets in flight to avoid leaving the pipe underutilized due to stop-and-wait behavior (loss of pipelining) induced by delayed ACKs. With 4 packets in flight, and a full-sized-packet serialization delay you have specified of 250ms, the best you could do would be an RTT of ~250ms. But that would only happen if the sender let the pipe be completely empty before sending the next packet, so that the benefit of pipelining is lost (risking underutilizing the pipe). So, to handle pipelining effects and delayed ACKs, there will typically be 1-3 packets in the pipeline ahead of a given packet, so that the RTT would be expected to vary between ~500ms and ~1000ms. So a median of 787ms does not seem too unreasonable, AFAICT.

neal

Pavel

On Wednesday, November 15, 2017 at 9:33:17 AM UTC-8, Neal Cardwell wrote:
From the one-packet-per-250ms time scale, assuming ~1500-byte MTU then I take it that 6KB/s is 6 Kilobytes per second (49Kbps). At that rate it does seem like a minimal-sized TCP/IPv4 packet could be serialized in a little under 10ms. In this kind of super-low-bandwidth case BBR would pick up the initial RTProp sample from the SYN/ACK handshake, and use that for the first RTprop filter window; after that it would switch to the measured RTT from the full-sized packets.

We routinely run tests of Linux TCP BBR down to 16Kbps, and it seems to work.

To check your 6KB/s case, I just ran tests of 1 and 4 Linux TCP BBR flows on a netem-emulated path with 6KB/s = 49Kbps and got results that are, IMHO, fine:

- BBR utilizes the pipe fully while keeping the queue bounded (no bufferbloat - e.g. with one flow the median RTT is 787ms)

- BBR achieves approximate fairness in the multi-flow case (bytes transferred per flow in a 3-minute test: {577753, 443089, 632777, 524177}; perfect fairness would be 6*1024 * 360 / 4 = 552960)

cheers,
neal

On Tue, Nov 14, 2017 at 11:35 PM, Jonathan Morton <chrom...@gmail.com> wrote:
On 15 Nov 2017 06:29, "Pavel P" <pavlov...@gmail.com> wrote:

> Does BBR handle properly cases like 6KB/s link with 10ms RTT (where a packet has to be sent every 250ms while RTT is just 10 ms)?

Such a thing is impossible. If it takes 250ms to send each packet (serialisation delay), then the RTT must be at least that, and BBR will measure it as such.

- Jonathan Morton

--
You received this message because you are subscribed to the Google Groups "BBR Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "BBR Development" group.

To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+unsubscribe@googlegroups.com.

Pavel P

unread,

Nov 15, 2017, 3:53:24 PM11/15/17

to BBR Development

In my case I do custom UDP based protocol, so, I guess my questions would be more applicable to bbr performance in QUIC instead of bbr in TCP. Would it keep average RTT within 10 ms (or at least within 250ms for one packet send ahead)?

Pavel

Neal Cardwell

unread,

Nov 15, 2017, 4:12:26 PM11/15/17

to Pavel P, BBR Development, Ian Swett, Jana Iyengar, Victor Vasiliev

Hi Pavel,

In the code at:

https://chromium.googlesource.com/chromium/src/net/+/master/quic/core/congestion_control/bbr_sender.cc

I see:

// The minimum CWND to ensure delayed acks don't reduce bandwidth measurements.

// Does not inflate the pacing rate.

const QuicByteCount kMinimumCongestionWindow = 4 * kMaxSegmentSize;

So currently I believe the QUIC BBR implementation would behave similarly to the Linux TCP implementation, in this respect. I am CC-ing the QUIC BBR authors, in case I am misreading the code.

But that's an open source user-space implementation, so I suppose it would be reasonable to make your implementation more gentle, and use a smaller floor for in-flight data in your application.

neal

To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+unsubscribe@googlegroups.com.

Pavel P

unread,

Nov 15, 2017, 4:59:30 PM11/15/17

to BBR Development

Thanks for the link, I'll study this implementation for ideas how to improve my logic :)

in my algorithm I kind of don't use cwnd directly (just for max BW estimation) and when I pace packets I simply let a packet out of the pipe once at given targetSpeed in non-app limited operation the packet would need to be sent.

Something like this:

bytesToSend = targetSpeed * (now - targetSpeedStartTime) - targetSpeedStartBytes;

if (bytesToSend > packet.size)

send(packet); targetSpeedStartBytes += packet.size;

and then depending on how I send ahead (if bytesToSend > 0 or bytesToSend > packet.size) I do get 250ms or 10ms RTT with 6KB/s.

In general, this is the use case that I made me think about this kind of scenario with low throughput and fast RTTs. I work on my personal CDN project (unrelated to the gaming protocol where I'm rewriting congestion control) and for testing my balancing algorithms I run bits between servers at full speed (100mbps or 1gbps). When I do that I have an ssh console open and if I try to type some command it shouldn't feel like my server is on the moon :) All I need is 10KB/s at 50ms rtt for my ssh connection when I type something ;) As understand BBR in this case would provide best possible results perhaps compared to whatever existed before BBR.

In case of the gaming protocol that I'm working with, ACKs are aggregated over 10ms timespan and then aggregated ACK packet is sent from a loop that does { sleep(10ms); doSomeWork(); } which means that not only there is 10 ms delay for the first aggregated ack, there is also variable 0ms-10ms delay because of the send loop. All of these are out of my control. So, overall with low latency links these algorithm delays totally outweigh real RTTs

Pavel

Pavel P

unread,

Nov 16, 2017, 2:24:17 PM11/16/17

to BBR Development

By the way, sorry for confusion, in my mind I was operating with rtt times that go on top of the time that it takes to move each packet. As noted, it takes at least 250 ms to move one packet through he pipe and rtt cannot be less than that.

Reply all

Reply to author

Forward