Looks like you've done a lot of nice work collecting and analyzing all those benchmark results.
Please pardon my dumbidity, but what the hell is "kernel density?" I cannot find a definition anywhere on your page. Any unconventional performance metrics should be clearly defined and related to more familiar performance metrics.
It seems that Amazon has put a good deal of effort into the latency and performance fluctuation problems.
Provisioned IOPS volumes really do deliver a more consistent, higher performance IO.
That being said, you'll still get some annoying variance on larger volumes. It seems that as volumes grow and data is more and more scattered among the underlying disks, your volumes will be more susceptible to performance fluctuations.
Please let me know your thoughts, I'll be more than happy to answer any questions.
On Tuesday, October 23, 2012 8:32:42 AM UTC-2, Rodrigo Campos wrote:
Thanks for your message, I'll do my best to answer your questions. :)
Little's Law shows a consistent 2560 jobs in the system if Total Time is seconds? Did you use 2560 threads of execution?
Unit time is in operations / second, this is done simply by dividing the total amount of operations or bytes by the number of wallclock seconds it took to perform the operation, as returned by gettimeofday().
IOMelt is a single-process, single-thread application. It's not a "stress test" in itself although it might generate a considerable amount of disk IO. So the answer would be no.
You should replot (log and linear y-axis) the interactions and color code the Test type
There is some periodicity to the Serial Write data. There an influential hit every 1648 seconds, and less influential every 832 seconds and 660 seconds. 1648 seconds is nearly a harmonic of 832 seconds.
Might try this later as I'm not on my own machine right now.
ISP frequently over subscribe customers so I would expect some uncontrollable time of day effects due to internet traffic.
One of the ISPs that I've tested shows a consistent IO contention for read and write operations at 03:00AM every single day, probably due to some backup or other maintenance procedure.
I used locator to pick the approximate center point of the Serial Read data points. I then used date -r to convert the numerical value to PDT:
iomelt robertlane$ date -r 1344754940
Sun Aug 12 00:02:20 PDT 2012
iomelt robertlane$ date -r 1345101995
Thu Aug 16 00:26:35 PDT 2012
iomelt robertlane$ date -r 1345370444
Sun Aug 19 03:00:44 PDT 2012
iomelt robertlane$ date -r 1345529140
Mon Aug 20 23:05:40 PDT 2012
The do not occur at the same time every day.
Before you get lost in the details of the individual trees, you should answer some questions:
Why are random rewrites faster than serial writes?
Random rewrites are performed on an already existent file while serial write is performed on a brand new, empty file. Probably the time to allocate new inodes is what makes serial writes take longer. I need to dig deeper into this but this is probably the cause.
Why are random rereads slower than serial reads?
Random rereads includes a lseek() call to a random position before every read() and this is what makes random rereads take longer:
Why is there such a large gap in read and write times?
IOMelt tries to overcome the file system cache effects by using posix_fadvise but there's no way to disable caching in other layers like the hypervisor or the underlying hardware. If I comment the posix_fadvise in the source there's a huge increase in the gap that you noticed since the read operations become even faster.
Here's the code that tries to avoid caching:
/* *TRY* to minimize buffer cache effect */
/* There's no guarantee that the file will be removed from buffer cache though */
/* Keep in mind that buffering will happen at some level */
perror("main() - Unable to use posix_fadvise() on workload file");
rc = fcntl(fd, F_NOCACHE, 0);
if (rc == -1) perror("main() - [__APPLE__] fcntl F_NOCACHE error");
rc = fcntl(fd, F_RDAHEAD, 0);
if (rc == -1) perror("main() - [__APPLE__] fcntl F_RDAHEAD error");
Write operations usually are not affected that much by caches, O_SYNC is used as an open() flag and this causes the operating system to try to commit every single write() call to the underlying hardware.
One commonly used method to try to overcome caches are increasing the workload file size to something that would not fit in normal caches. I've not done this because my main interest was to understand if the AWS EBS IO was consistent over time, and not the capacity in itself.
Just ran a similar test in one VPS to demonstrate the results when using different file sizes, virtual machine has 1Gb of total memory, in the first time the file size is 50Mb, the second time file size is 1.17Gb:
Serial Write Calls/s
Serial Read Calls/s
Random Rewrite Calls/s
Random Reread Calls/s
Random Mixed Read/Write Calls/s
Serial write is the least affected but serial write throughput is almost three times bigger on the smaller file.
I was amazed by the other results though, I imagined that lseek() would be a problem in the larger file but did not expect it to be such a big problem.
This is what amazed me the most:
Finished all tests
Total wallclock time: 3672.165418
CPU user/system time: 1.095833/31.895151
I will definitely double check this, but I was getting more than 97% iowait during the tests which demonstrates that the provider's infrastructure is heavily overloaded.
BTW, I ran these tests in a local VPS provider, so don't blame for the AWS EBS mayhem of yesterday… ;)
Here is a histogram with overlaid density plot.
Some of your plots show multiple modes and striations in the data that do not appear with this data set. You need to be able to explain the differences.
On Wednesday, October 17, 2012 5:15:52 AM UTC-7, Rodrigo Campos wrote:
The same instance type shows a different behavior depending on the region that it is running, this is particularly critical if you depend on multiple regions for disaster recovery or geographical load balancing
One could argue that now you can used EBS Optimized instances to overcome this "problem", I've not tested these instances yet and not every instance type has this feature available
Generally speaking perfomance is better and more consistent in the South America region when compared to Virginia, this is probably due to the fact that SA region was the latest to be deployed. Maybe the SA datacenter uses new server models or it is just underutilized, but this is a wild guess
Write performance for the medium instance type in Virginia abruptly decays, droping from almost 400 call/s to something around 300 calls/s, this is not very clear in the scatter plot but if you draw a time-based chart you can clearly see this pattern. This is the main reason you see two spikes in the density chart.
Read performance in the SA small instance show a similar behavior.
Small instances definitely should not be used for disk IO bound applications since its behavior is rather erratical even for read operations, this is particularly true in the Virginia region
I've done the very same tests on several VPS providers here in Brazil and found some disturbing results. In one case the read and write performance simply plummets at 03:00AM, probably due to backup or maintenance procedures.