AWS Elastic Block Storage benchmark

771 views
Skip to first unread message

Rodrigo Campos

unread,
Oct 17, 2012, 8:15:52 AM10/17/12
to guerrilla-cap...@googlegroups.com
I've been working on benchmarking the IO performance of the AWS Elastic Block Storage for some time and recently published the results. You can read about the methodology and see the charts here: http://iomelt.com/iomelt-aws-benchmark-august-2012/

Some interesting findings:

  • The same instance type shows a different behavior depending on the region that it is running, this is particularly critical if you depend on multiple regions for disaster recovery or geographical load balancing
    • One could argue that now you can used EBS Optimized instances to overcome this "problem", I've not tested these instances yet and not every instance type has this feature available
  • Generally speaking perfomance is better and more consistent in the South America region when compared to Virginia, this is probably due to the fact that SA region was the latest to be deployed. Maybe the SA datacenter uses new server models or it is just underutilized, but this is a wild guess
  • Write performance for the medium instance type in Virginia abruptly decays, droping from almost 400 call/s to something around 300 calls/s, this is not very clear in the scatter plot but if you draw a time-based chart you can clearly see this pattern. This is the main reason you see two spikes in the density chart.
    • Read performance in the SA small instance show a similar behavior.
  • Small instances definitely should not be used for disk IO bound applications since its behavior is rather erratical even for read operations, this is particularly true in the Virginia region

I've done the very same tests on several VPS providers here in Brazil and found some disturbing results. In one case the read and write performance simply plummets at 03:00AM, probably due to backup or maintenance procedures.

Here's a link to the raw data as well as the R script used to create the charts, feel free to send comments and suggestions.

Please note that the R script included in this tarball is not compatible with the latest version of iomelt, it is a really quick hack to make it compatible though.


Best Regards,
-- 
Rodrigo Campos

Stephen O'Connell

unread,
Oct 22, 2012, 6:48:37 PM10/22/12
to guerrilla-cap...@googlegroups.com
High performance is great, when it is available:


Stephen...

On Mon, Oct 22, 2012 at 2:13 PM, rml...@gmail.com <rml...@gmail.com> wrote:
I took a few minutes to look at your data.

What is the unit of time for Total Time? Seconds? 
Little's Law shows a consistent 2560 jobs in the system if Total Time is seconds? Did you use 2560 threads of execution?
You should replot (log and linear y-axis) the interactions and color code the Test type

iomelt.sa<-read.csv('iomelt-m1.large-sa-east-1',head=T,sep=';')

iomelt.sa$ts<-strptime(iomelt.sa$Date, "%a %b %d %H:%M:%S %Y")

plot(c(min(iomelt.sa$ts),max(iomelt.sa$ts)),c(min(iomelt.sa$Total.Time),max(iomelt.sa$Total.Time)),type='n',xlab='Time',ylab='Total Time (sec)',log='y')

for(x in 1:nlevels(iomelt.sa$Test)){points(iomelt.sa$ts[iomelt.sa$Test==iomelt.sa$Test[x]],iomelt.sa$Total.Time[iomelt.sa$Test==iomelt.sa$Test[x]],col=as.numeric(x),pch=20,cex=.5)}

legend(locator(),legend=levels(iomelt.sa$Test),col=1:nlevels(iomelt.sa$Test),lwd=3,lty=1)

title("SA").

There is some periodicity to the Serial Write data. There an influential hit every 1648 seconds, and less influential every 832 seconds and 660 seconds. 1648 seconds is nearly a harmonic of 832 seconds.

ISP frequently over subscribe customers so I would expect some uncontrollable time of day effects due to internet traffic.

I used locator to pick the approximate center point of the Serial Read data points. I then used date -r to convert the numerical value to PDT:

iomelt robertlane$ date -r 1344754940 
Sun Aug 12 00:02:20 PDT 2012
iomelt robertlane$ date -r  1345101995
Thu Aug 16 00:26:35 PDT 2012
iomelt robertlane$ date -r 1345370444
Sun Aug 19 03:00:44 PDT 2012
iomelt robertlane$ date -r  1345529140
Mon Aug 20 23:05:40 PDT 2012

The do not occur at the same time every day.

Before you get lost in the details of the individual trees, you should answer some questions:
  1. Why are random rewrites faster than serial writes?
  2. Why are random rereads slower than serial reads?
  3. Why is there such a large gap in read and write times?
Here is a histogram with overlaid density plot.

Some of your plots show multiple modes and striations in the data that do not appear with this data set. You need to be able to explain the differences.

Bob

--
You received this message because you are subscribed to the Google Groups "Guerrilla Capacity Planning" group.
To view this discussion on the web visit https://groups.google.com/d/msg/guerrilla-capacity-planning/-/D7bZFbSob70J.

To post to this group, send email to guerrilla-cap...@googlegroups.com.
To unsubscribe from this group, send email to guerrilla-capacity-...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/guerrilla-capacity-planning?hl=en.

DrQ

unread,
Oct 22, 2012, 6:55:42 PM10/22/12
to guerrilla-cap...@googlegroups.com, s...@saoconnell.com
https://twitter.com/TUAW/status/260503127256596481
Stephen...

To unsubscribe from this group, send email to guerrilla-capacity-planning+unsub...@googlegroups.com.

Rodrigo Campos

unread,
Oct 23, 2012, 6:32:42 AM10/23/12
to guerrilla-cap...@googlegroups.com
Thanks for your message, I'll do my best to answer your questions. :)


On Mon, Oct 22, 2012 at 7:13 PM, rml...@gmail.com <rml...@gmail.com> wrote:
I took a few minutes to look at your data.

What is the unit of time for Total Time? Seconds? 
Little's Law shows a consistent 2560 jobs in the system if Total Time is seconds? Did you use 2560 threads of execution?

Unit time is in operations / second, this is done simply by dividing the total amount of operations or bytes by the number of wallclock seconds it took to perform the operation, as returned by gettimeofday().

IOMelt is a single-process, single-thread application. It's not a "stress test" in itself although it might generate a considerable amount of disk IO. So the answer would be no.
 
You should replot (log and linear y-axis) the interactions and color code the Test type

iomelt.sa<-read.csv('iomelt-m1.large-sa-east-1',head=T,sep=';')

iomelt.sa$ts<-strptime(iomelt.sa$Date, "%a %b %d %H:%M:%S %Y")

plot(c(min(iomelt.sa$ts),max(iomelt.sa$ts)),c(min(iomelt.sa$Total.Time),max(iomelt.sa$Total.Time)),type='n',xlab='Time',ylab='Total Time (sec)',log='y')

for(x in 1:nlevels(iomelt.sa$Test)){points(iomelt.sa$ts[iomelt.sa$Test==iomelt.sa$Test[x]],iomelt.sa$Total.Time[iomelt.sa$Test==iomelt.sa$Test[x]],col=as.numeric(x),pch=20,cex=.5)}

legend(locator(),legend=levels(iomelt.sa$Test),col=1:nlevels(iomelt.sa$Test),lwd=3,lty=1)

title("SA").

There is some periodicity to the Serial Write data. There an influential hit every 1648 seconds, and less influential every 832 seconds and 660 seconds. 1648 seconds is nearly a harmonic of 832 seconds.


Might try this later as I'm not on my own machine right now.
 
ISP frequently over subscribe customers so I would expect some uncontrollable time of day effects due to internet traffic.

One of the ISPs that I've tested shows a consistent IO contention for read and write operations at 03:00AM every single day, probably due to some backup or other maintenance procedure.
 

I used locator to pick the approximate center point of the Serial Read data points. I then used date -r to convert the numerical value to PDT:

iomelt robertlane$ date -r 1344754940 
Sun Aug 12 00:02:20 PDT 2012
iomelt robertlane$ date -r  1345101995
Thu Aug 16 00:26:35 PDT 2012
iomelt robertlane$ date -r 1345370444
Sun Aug 19 03:00:44 PDT 2012
iomelt robertlane$ date -r  1345529140
Mon Aug 20 23:05:40 PDT 2012

The do not occur at the same time every day.

Before you get lost in the details of the individual trees, you should answer some questions:
  1. Why are random rewrites faster than serial writes?
Random rewrites are performed on an already existent file while serial write is performed on a brand new, empty file. Probably the time to allocate new inodes is what makes serial writes take longer. I need to dig deeper into this but this is probably the cause.
  1. Why are random rereads slower than serial reads?
Random rereads includes a lseek() call to a random position before every read() and this is what makes random rereads take longer:

      lseek(fd,random() % (fileSize - blockSize), SEEK_SET);
      rc = read(fd,buf,blockSize);

 
  1. Why is there such a large gap in read and write times?
IOMelt tries to overcome the file system cache effects by using posix_fadvise but there's no way to disable caching in other layers like the hypervisor or the underlying hardware. If I comment the posix_fadvise in the source there's a huge increase in the gap that you noticed since the read operations become even faster.

Here's the code that tries to avoid caching:

   /* *TRY* to minimize buffer cache effect */
   /* There's no guarantee that the file will be removed from buffer cache though */
   /* Keep in mind that buffering will happen at some level */
   #ifndef __APPLE__
   rc = posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED);
   if (rc !=0)
   {   
      perror("main() - Unable to use posix_fadvise() on workload file");
   }   
   #else
   rc = fcntl(fd, F_NOCACHE, 0); 
   if (rc == -1) perror("main() - [__APPLE__] fcntl F_NOCACHE error");
   rc = fcntl(fd, F_RDAHEAD, 0); 
   if (rc == -1) perror("main() - [__APPLE__] fcntl F_RDAHEAD error");


Write operations usually are not affected that much by caches, O_SYNC is used as an open() flag and this causes the operating system to try to commit every single write() call to the underlying hardware.

One commonly used method to try to overcome caches are increasing the workload file size to something that would not fit in normal caches. I've not done this because my main interest was  to understand if the AWS EBS IO was consistent over time, and not the capacity in itself.

Just ran a similar test in one VPS to demonstrate the results when using different file sizes, virtual machine has 1Gb of total memory, in the first time the file size is 50Mb, the second time file size is 1.17Gb:



50Mb

1,17Gb

Serial Write Calls/s

1981,95

2143,91

Serial Read Calls/s

157878,50

52598,55

Random Rewrite Calls/s

4937,62

209,05

Random Reread Calls/s

15256,07

424,45

Random Mixed Read/Write Calls/s

6765,20

231,04



Serial write is the least affected but serial write throughput is almost three times bigger on the smaller file.


I was amazed by the other results though, I imagined that lseek() would be a problem in the larger file but did not expect it to be such a big problem.


This is what amazed me the most:


Finished all tests

Total wallclock time: 3672.165418

CPU user/system time: 1.095833/31.895151


I will definitely double check this, but I was getting more than 97% iowait during the tests which demonstrates that the provider's infrastructure is heavily overloaded.


BTW, I ran these tests in a local VPS provider, so don't blame for the AWS EBS mayhem of yesterday… ;)



 

Here is a histogram with overlaid density plot.

Some of your plots show multiple modes and striations in the data that do not appear with this data set. You need to be able to explain the differences.

Bob


On Wednesday, October 17, 2012 5:15:52 AM UTC-7, Rodrigo Campos wrote:

--
You received this message because you are subscribed to the Google Groups "Guerrilla Capacity Planning" group.
To view this discussion on the web visit https://groups.google.com/d/msg/guerrilla-capacity-planning/-/D7bZFbSob70J.
To post to this group, send email to guerrilla-cap...@googlegroups.com.
To unsubscribe from this group, send email to guerrilla-capacity-...@googlegroups.com.

DrQ

unread,
Jan 21, 2013, 1:48:13 PM1/21/13
to guerrilla-cap...@googlegroups.com
Hi Rodrigo,

Looks like you've done a lot of nice work collecting and analyzing all those benchmark results.

Please pardon my dumbidity, but what the hell is "kernel density?" I cannot find a definition anywhere on your page. Any unconventional performance metrics should be clearly defined and related to more familiar performance metrics.


--njg

On Monday, January 21, 2013 10:27:10 AM UTC-8, Rodrigo Campos wrote:
Just giving a heads up as I've published a new set of results using the Provisioned IOPS EBS volumes.

It seems that Amazon has put a good deal of effort into the latency and performance fluctuation problems.

Provisioned IOPS volumes really do deliver a more consistent, higher performance IO.

That being said, you'll still get some annoying variance on larger volumes. It seems that as volumes grow and data is more and more scattered among the underlying disks, your volumes will be more susceptible to performance fluctuations.

Please let me know your thoughts, I'll be more than happy to answer any questions.


Best,

To unsubscribe from this group, send email to guerrilla-capacity-planning+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages