Performance over long distance high bandwidth networks

443 views
Skip to first unread message

Carl Hofmeister

unread,
Aug 16, 2017, 2:50:34 PM8/16/17
to QUIC Prototype Protocol Discussion group
Hi all,
I have been working on a research project where we have been testing the file transfer performance of a few protocols including QUIC over long distance high bandwidth networks. I have some preliminary results from some tests that I did that compare QUIC with TCP, as well as some questions if someone is willing to answer them.

I started with the quic-server and quic-client, but found HTTP to be unnecessary for what was exclusively file transfer. I made a modified version of the quic-server and quic-client that is just a clone of the existing SPDY streams with the HTTP components removed, and can send arbitrarily large files. I had hoped that this would improve throughput slightly, but it seems to be the same as the original quic-server and quic-client.

All my transfers were sent from the University of Saskatchewan in Saskatoon, Canada, and I tested different receiving ends in Waterloo Ontario, Sherbrooke Quebec, Auckland New Zealand, and another machine at the University of Saskatchewan (Saskatoon, Canada). The maximum bandwidth of every connection is 1Gbit/sec.

Round trip times:
U of S: 1ms
Waterloo: 35ms
Sherbrooke: 40ms
Auckland: 180ms

I tested throughput of single streams as well as multiple competing streams.
Units are all in Megabits per second.

1 QUIC stream
UofS: 41.7233
Waterloo: 41.446
Sherbrooke: 40.4536
Auckland: 38.9536

2 competing QUIC streams
UofS: 40.3249
Waterloo: 45.0905
Sherbrooke: 42.7394
Auckland: 38.4414

4 competing QUIC streams
UofS: 21.3517
Waterloo: 35.4937
Sherbrooke: 24.7738
Auckland: 31.1733

1 TCP stream
UofS: 236.565
Waterloo: 300.809
Sherbrooke: 268.247
Auckland: 80.3132

2 competing TCP streams
UofS: 186.822
Waterloo: 201.329
Sherbrooke: 189.109
Auckland: 80.3061

4 competing TCP streams
UofS: 93.1884
Waterloo: 96.5582
Sherbrooke: 102.322
Auckland: 62.6127

1 QUIC stream, 1 TCP stream
UofS: tcp: 193.821500, quic: 30.571950
Waterloo: tcp: 212.380750, quic: 33.172700
Sherbrooke: tcp: 121.629550, quic: 29.944550
Auckland: tcp: 60.834225, quic: 28.134875

2 QUIC streams, 2 TCP streams
UofS: tcp: 89.198750, quic: 27.389675
Waterloo: tcp: 55.140150, quic: 34.871375
Sherbrooke: tcp: 64.906875, quic: 29.896775
Auckland: tcp: 52.522150, quic: 28.242350

4 QUIC streams, 4 TCP streams
UofS: tcp: 25.190400, quic: 12.482325
Waterloo: tcp: 14.821700, quic: 16.252675
Sherbrooke: tcp: 17.351000, quic: 14.555875
Auckland: tcp: 13.298825, quic: 15.776350


In most cases TCP had higher throughput, but in situations of high congestion and larger round trip times (four QUIC streams, four TCP streams to Auckland), QUIC starts to perform better.

I used a 500MB tar file for all the tests, but I also tried a 1.2GB file which gave similar results.
I used dumpcap to capture outgoing packets on the machine doing the sending, and wireshark/tshark to find the average bandwidth. I did multiple transfers throughout the day and took the average of them, but there are still some wierd values (transfers to Waterloo sometimes perform better than transfers to the local U of S network).

I also tried changing some flags and constants in quic_flags_list.h and quic_connection.cc.  In quic_connection.cc, I changed kDefaultRetransmittablePacketsBeforeAck from 2 to 20, and saw a noticable increase in speed:

1 QUIC stream
UofS: 68.3129
Waterloo: 66.5002
Sherbrooke: 62.5588
Auckland: 61.6268

2 QUIC streams
UofS: 61.772
Waterloo: 64.5282
Sherbrooke: 59.3394
Auckland: 59.5028

4 QUIC streams
UofS: 35.7902
Waterloo: 53.0485
Sherbrooke: 35.5793
Auckland: 50.0515

1 QUIC stream, 1 TCP stream
UofS: tcp: 190.115250, udp: 50.875500
Waterloo: tcp: 221.220250, udp: 49.034500
Sherbrooke: tcp: 218.880500, udp: 43.976825
Auckland: tcp: 61.341775, udp: 47.311275

2 QUIC streams, 2 TCP streams
UofS: tcp: 105.374000, udp: 40.930250
Waterloo: tcp: 72.324425, udp: 50.957400
Sherbrooke: tcp: 97.856250, udp: 38.281200
Auckland: tcp: 54.357875, udp: 48.186700

4 QUIC streams, 4 TCP streams
UofS: tcp: 37.996725, udp: 21.136800
Waterloo: tcp: 25.026875, udp: 28.635550
Sherbrooke: tcp: 29.878325, udp: 20.490775
Auckland: tcp: 24.937275, udp: 27.463325

I also tried using BBR for congestion control. The performance seemed to be slightly better Cubic with the kDefaultRetransmittablePacketsBeforeAck set to 2:

1 QUIC stream
UofS: 49.0245
Waterloo: 48.9832
Sherbrooke: 42.8096
Auckland: 23.9718

2 competing QUIC streams
UofS: 39.498
Waterloo: 44.0193
Sherbrooke: 37.9645
Auckland: 23.675

4 competing QUIC stream
UofS: 20.9513
Waterloo: 36.2447
Sherbrooke: 20.6761
Auckland: 21.98

1 QUIC streams, 1 TCP streams
UofS: tcp: 292.518000, udp: 47.321600
Waterloo: tcp: 229.384000, udp: 42.630550
Sherbrooke: tcp: 267.684000, udp: 40.822400
Auckland: tcp: 79.150450, udp: 24.795650

2 QUIC streams, 2 TCP streams
UofS: tcp: 136.782000, udp: 34.992700
Waterloo: tcp: 93.637250, udp: 45.999450
Sherbrooke: tcp: 109.088150, udp: 35.688450
Auckland: tcp: 75.388550, udp: 23.777550

4 QUIC streams, 4 TCP streams
UofS: tcp: 43.921250, udp: 17.456000
Waterloo: tcp: 25.748300, udp: 27.228700
Sherbrooke: tcp: 26.704600, udp: 16.596000
Auckland: tcp: 27.802950, udp: 19.001650


I have a few questions:
- Do these numbers seem reasonable?
- Is it possible to get better performance out of BBR in QUIC?
I've only enabled BBR in quic_flags_list.h, but I noticed quite a few other flags relating to BBR in the same file and am not sure which ones are worth investigating.
- Are there other parameters and constants worth looking into that might lead to better performance on long distance high bandwidth connections?

Thanks,
- Carl Hofmeister

Pavan K

unread,
Aug 17, 2017, 11:23:29 AM8/17/17
to proto...@chromium.org
Hi Carl,

The numbers seem quite reasonable.


QUIC seems to perform better when the packet loss is >5% compared to TCP due to better congestion control.

My assumption was FEC was supposed to increase the performance.  However, FEC has been disabled and there is discussion around it here : https://groups.google.com/a/chromium.org/forum/#!topic/proto-quic/Z-Fr2-3ixMQ 
BBR is indeed an interesting thing to look into which I didn't. If you find any other please drop a note in the thread. 

Thanks, 
Pavan

ckr...@chromium.org

unread,
Aug 30, 2017, 8:19:26 PM8/30/17
to QUIC Prototype Protocol Discussion group
tl;dr - the toy quic_server and quic_client are definitely  not to be considered performant representation of QUIC.   

There have been some improvements in QUIC chromium code to help with uploads recently.    I did some measurements to evaluate those based on using chrome and google drive.  The measurements were based on uploading a 5GB file to drive. 

I built chrome from source as the improvements I mentioned were recent changes, but I think the current dev channel release of chrome should now be caught up to them.  

The tests were done on my development workstation, which is a very fast linux machine with great network connectivity.  QUIC measured RTT for my tests was around 1ms.  So this is high bandwidth, but not at all long distance.

In my tests, QUIC was routinely sustaining between 700-800 Mbps.   The only non-default option that made a difference was to enable AKD4 (ack decimation).   Cubic or BBR didn't change much, but with 1ms RTT that's not surprising.
I should mention that drive uses a single QUIC connection for the upload, and it starts a new QUIC stream for every 1GB of data.   

I offer these numbers simply as an existence proof that the previously posted data should definitely not be considered close to represent what QUIC can do, as the toy server and toy client are simply not tuned for performance.   If you have a reasonably powered PC to run chrome on, you should be able to reproduce my results.

I don't have a great suggestion for external experimentation in this space.   If you can focus on send side improvements only, as in within congestion control, then uploads to drive might be a good option since building chrome is not that hard, although you'll need a way to impose delay for the distance aspect.   

-- Buck

屠越

unread,
Sep 3, 2017, 8:28:37 AM9/3/17
to QUIC Prototype Protocol Discussion group
Hi Buck, 
could you tell me how you set the congestion control algorithm to Cubic or to BBR, when you are doing measurements using chrome and google drive? 

Thanks,
Gary


在 2017年8月31日星期四 UTC+8上午8:19:26,ckr...@chromium.org写道:

ckr...@chromium.org

unread,
Sep 4, 2017, 10:52:51 AM9/4/17
to QUIC Prototype Protocol Discussion group
To enable BBR, I re-compiled chrome with  FLAGS_quic_reloadable_flag_quic_default_to_bbr set to true in quic_flags_list.h.

Charles 'Buck' Krasic

unread,
Sep 5, 2017, 1:12:04 PM9/5/17
to Fadi Al, proto...@chromium.org


On Sun, Sep 3, 2017 at 5:50 PM, Fadi Al <fadi...@gmail.com> wrote:
Dear Buck, 

The work that has been done with Quic so far by my colleague Carl and I as part of our research project was trying to get quic to perform without the extra baggage of HTTP, thus to be comparable with other data transfer protocols such as tcp with different congestion control mechanisms, Gridftp, FASP etc.. 

Our goal is to get Quic to transfer large files over long distance from one end to another and comparing its performance with other protocols in terms of bandwidth, time taken, packet loss etc.. Is there any tuned version of the quic server and quic client that can be used to get a performant representation of QUIC that does not use chrome?


tl;dr - no, some work is needed.  

The toy quic server and clients are very simple wrappers around core QUIC code that is used in Chrome and Google production servers.   However, the wrappers were written with simplicity and bare bones functionality foremost.    I think it's fair to say that (close to) zero effort has gone into analyzing much less rectifying the performance problems in the wrappers.    There could be a small amount related to "tuning", as in setting QUIC parameters, buffer sizes, etc., but I actually doubt this is where the major limitations are for single stream performance.   I suspect as much it to be simplistic coding (generous copies etc.) in the wrappers.   

I also doubt that HTTP is a major performance concern, and you'd think it can't hurt to cut it out, but one exception comes to my mind.   If I were to look at this, I would want to use chrome's net logging to help analysis.  For instance, it would be my first choice to see if flow control blocking is an issue.  It's not the only way to do it, you could add your own logging, but you'd be duplicating effort.  The toy server and client do not have options to generate these net logs, but it should not be a difficult to add code for that.   It should only be a matter to instantiate a couple of classes from chromium/src/net/log/  at startup, one to enable the capture, and the other to write the results to a file.   This may come with extra problems if you remove all the HTTP layering though.

It would probably also be helpful to use a cpu profiler to identify hot spots.
 

Also, is there a branch of development that would be reasonable to allow uploads that don't use HTTP and a webserver interface to the file system. 


I'm afraid not.   

IETF is standardizing QUIC, and there may be other implementations that become options.  My best guess would be that those will not be usable for six months to a year from now.

I would really appreciate your kind assistance and I am looking forward to your response.



Regards, 


Fadi  

 


--
You received this message because you are subscribed to the Google Groups "QUIC Prototype Protocol Discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to proto-quic+unsubscribe@chromium.org.
To post to this group, send email to proto...@chromium.org.
For more options, visit https://groups.google.com/a/chromium.org/d/optout.


Fadi Al

unread,
Sep 11, 2017, 8:20:27 PM9/11/17
to QUIC Prototype Protocol Discussion group, fadi...@gmail.com

Hi Buck, 

So if I wanted to initiate a transfer between two endpoints and I plan to see the real quic performance, I would need to use chrome from the chromium source on one side and a web server that uses QUIC on the other side, right?  

If that is the case then I would need something similar to google drive to run on the other endpoint since I would need a server that I can have control over.

Could be please guide me on how to go about that further?  Thanks. 


-Fadi
To unsubscribe from this group and stop receiving emails from it, send an email to proto-quic+...@chromium.org.

Pavan K

unread,
Sep 12, 2017, 2:29:38 AM9/12/17
to QUIC Prototype Protocol Discussion group, fadi...@gmail.com
Could this implementation be used? https://github.com/lucas-clemente/quic-go 

ckr...@chromium.org

unread,
Sep 12, 2017, 1:12:05 PM9/12/17
to QUIC Prototype Protocol Discussion group, fadi...@gmail.com


On Monday, September 11, 2017 at 5:20:27 PM UTC-7, Fadi Al wrote:

Hi Buck, 

So if I wanted to initiate a transfer between two endpoints and I plan to see the real quic performance, I would need to use chrome from the chromium source on one side and a web server that uses QUIC on the other side, right?  

If that is the case then I would need something similar to google drive to run on the other endpoint since I would need a server that I can have control over.

Could be please guide me on how to go about that further?  Thanks. 


One could try.  I would be pleasantly surprised if any of the currently available open source QUIC servers would perform as well as Google's production QUIC servers.    
This will hopefully change as IETF standardization progresses, and more implementations gain steam.   

I would suggest first setting up client machine, and somehow (netem?) insert artificial delay.   The see how chrome uploads to drive perform at various delays.   BBR should be interesting there.

howard liao

unread,
Jul 26, 2018, 6:16:42 AM7/26/18
to QUIC Prototype Protocol Discussion group
yes, i got a same result after testing bbr compared with cubic in quic

在 2017年8月17日星期四 UTC+8下午11:23:29,Pavan K写道:
Reply all
Reply to author
Forward
0 new messages