Data transfer speeds on Google Compute Engine

326 views

Skip to first unread message

David Knezevic

unread,

Mar 6, 2015, 11:45:03 AM3/6/15

to gce-dis...@googlegroups.com

I'm trying to figure out how fast we can load data into Google Compute Engine instances from either a persistent disk or from Google Cloud Storage.

First, regarding the persistent disk: I made a 550 GB persistent disk and found that I could read from the disk simultaneously to six instances at a throughput of approximately 125 MB/s per instance (each instance was reading a different file, and I was careful to make sure the file was not already cached in RAM). This throughput was much higher than I expected-- from reading the documentation https://cloud.google.com/compute/docs/disks/ (specifically, the "Read throughput per GB") I was expecting a total throughput of 550 * 0.12 = 66 MB/s, yet total throughput for all the instances combined was over ten times higher, at about 6 * 125 MB/s = 750 MB/s.

Hence my questions are:

Why was the throughput on the persistent disk so much higher than I expected? Do the persistent disk throughput limits given in the documentation only apply to a single instance?
How well can I expect the throughput for a single persistent disk to scale if I have a hundred or more instances attached to that disk? Will I need to create duplicate persistent disks if I want to maintain a high throughput per instance?

Second, regarding loading data from Google Cloud Storage: Here I found I could read different files onto ten Compute Engine instances simultaneously at a rate of about 30-60 MB/s per instance.

So, for Cloud Storage, my question is:

Is 30-60 MB/s per instance a reasonable throughput to expect when transferring from Google Cloud Storage to Google Compute Engine, or is it possible for me to do better? Also, can we expect this throughput to be maintained for each instance if we add many (hundreds) of Compute Engine instances?

Kirill Tropin

unread,

Mar 6, 2015, 2:59:59 PM3/6/15

to gce-dis...@googlegroups.com

Hi David,

I can help you with Persistent Disk information. You are getting more than sustained throughput per instance due to burst functionality. It's implemented so that customers with spiky workloads can burst over the sustained limit for short periods of time. You can read about it here.

Performance of a single read-only PD attached to 10s of instances will highly depend on your scenario. And because of that we can't guarantee that it will scale linearly with number of instances.

I'd like to learn more about your scenario, and see if there is a better solution for you from PD point of view. Feel free to respond to me directly.

Thank you,

Kirill

Reply all

Reply to author

Forward

0 new messages