Neil et al,
I read with interest the Hadoop scalability paper - thanks for the work.
As we all know measuring and computing scalability is complex enough for a single distributed system such as Hadoop.
I am faced with the challenge of modeling scalability for a complex application that includes multiple such distributed platforms. Specifically, a custom-built app, Cassandra and Solr.
The custom app pumps data into Cassandra and Solr is used to run queries on that data. The goal is to ensure that query response time does not degrade as the data size increases (with corresponding scale up of the Cassandra and Solr configs.)
Publicly, there are some results from measuring Cassandra scalability, less on Solr and we have some internal results from the custom app which scales really well as long as the receiving end (Cassandra) doesn't become a bottleneck.
We can do a few measurements of the system processing different sized data sets to achieve the targeted query response time.
The question is how can we extend USL (or build a new model) to help predict scalability for the entire system?
Thanks much for this group's insights.