One fact they demonstrated is that since you’re simply paying for minutes of virtual machine use, one virtual machine running for ten minutes costs the same as getting ten virtual machines to complete a job in one minute. Regardless of how many CPUs they split the job across, both earthquake detection methods worked through a year of the network’s data for less than $5.
The team encourages other researchers to consider the value proposition of working this way:
”The learning curve associated with cloud set-up is steep. But, the results of our scaling tests show that seismic processing in the cloud is both cheap and fast. Since the low cost of cloud computing makes large-scale processing more accessible to the seismic community, the migration of local workflows to the cloud is a worthy endeavor.”
That’s particularly true when the alternative would require building or managing larger local computing resources than just your own computer. And as more cloud training materials and support become available, the learning curve is getting a bit less steep.
It’s worth noting that this team downloaded their dataset from the NSF SAGE archive, requiring them to upload the data to Azure and carefully manage storage there to navigate complexity and cost. One of the primary goals of our ongoing migration to cloud data services is that, once they are optimized for cloud storage, you’ll be able to run your analysis in the same cloud system and access the data directly—eliminating the need to move and manage your own copy of a dataset.
You can find the full details of their experience, including code and tutorials, in the paper and an associated GitHub repository. If you’d like to learn a little more, we talked with first author Zoe Krauss about the lessons learned from this project. Please check out that conversation in the video below!