I'm just writing to report on my experience using Starcluster, which enables the use of NumPy and Scipy in the Amazon EC2 cloud computing environment. The purpose of my email is to extol Starcluster's qualities, and suggest that the NumPy community be aware of its development. I suspect there are others in the community who find cloud computing an attractive idea but a little daunting to get into, and would be pleasantly surprised out how easy Starcluster makes it to get started using NumPy on Amazon EC2.
For those of you who aren't familiar with AMIs and the Amazon EC2
service, see e.g.
http://en.wikipedia.org/wiki/Amazon_Elastic_Compute_Cloud. Three of the basic concepts are "Amazon Machine Images" (AMIs), "machine instances" of AMIs, and the Elastic Block Storage (EBS) service. AMIs are disk images containing a virtual machine, including an operating system
and other software you add on. Instances are temporarily allocated computers, booted with
your chosen virtual machine, that you start up on demand, use for computations with software from the AMI, and then terminate. EBS is a persistent storage service, also from Amazon, that serves as permanent file-systems in the cloud. You allocate an EBS volume of a given size, attach the EBS volume(s) to a running machine instance just like any other hard-drive, and use it to store the files you use/create during computation, both during the computation and then for later use whenever you start up a new instance.
A couple of weeks ago I wrote to this list asking for advice on finding a good Amazon Machine Instance (AMI) for using NumPy and Scipy on Amazon cloud. I didn't want to have to build a linux machine image with optimized blas and lapack myself, and I figured that there might be good existing publicly-available AMIs that I could use as a base. Robert Kern suggested that I look into the Starcluster project (
http://web.mit.edu/stardev/cluster/).
I have found Starcluster extremely useful. It made it possible for me to, in the course of one day, go from knowing essentially nothing about cloud something, to being able to run large-scale parallel clusters with my favorite NumPy/SciPy-scripts.
The basis of what Starcluster offers are two solidly-build AMIs. The operating system is Ubuntu Jaunty, and comes with prebuilt optimized blas and lapack, numpy, Scipy, matplotlib, ipython, and several other useful packages for scientific computing in python. It uses Python 2.6, and comes in both 32-bit and 64-bit flavors. The AMIs are based on AMIs from Alestic (
http://alestic.com/), and are built with best-practices for ensuring stability and good interaction with Amazon's system. They have proved very stable and extensible.
In addition to these AMIs, Starcluster has three extremely useful features:
-- Built-in support for mounting EBS drives as NFS filesystems
, and then administering the shared drive across multiple machine instances.
-- The Sun Grid Engine (SGE), a queuing system for scheduling jobs to be run in parallel across instances
-- A python module with a few commands that give you an incredibly simple interface for automating the process of starting/terminating a cluster of instances, mounting the shared drive, starting the grid engine, &c -- and configuring your cluster needs (e.g. how many nodes it will contain, which AMIs to use, which EBS volumes to mount etc.).
As a result, all you have to do to have a NumPy-enabled cluster-on-demand is:
1) Get an amazon EC2 account, and the accompanying security credentials (.501 certificates and PGP keypair) for your account.
2) Install starcluster ("easy_install starcluster")
3) Follow the installation procedure on the starcluster website for getting, attaching, and formatting an EBS volume as an NFS drive.
4) Set up your starcluster configuration file.
5) Start a 1-node cluster, modify the installation as you see fit, and re-bundle the result into a new AMI as described on the Amazon website
http://docs.amazonwebservices.com/AWSEC2/latest/GettingStartedGuide/. (Don't forget to edit your starcluster configuration file to reflect your new AMI.) This step is optional -- If you don't need anything else special, you can just use Starcluster's base images.
After that, starting a cluster is as easy as typing single command ("starcluster -s"). To submit parallel jobs on your cluster, you can learn to use the Sun Grid Engine "qsub" command (
http://gridengine.sunsource.net/nonav/source/browse/~checkout~/gridengine/doc/htmlman/htmlman1/qsub.html) or use the python bindings to the SGE interface (
http://code.google.com/p/drmaa-python/). Or, if you like Parallel Python, that works perfectly well on these clusters too.
Overall, in my experience, Starcluster has been easy, stable and powerful, and I encourage anyone who is curious about cloud computing with Numpy to look into it.
Starcluster is by no means a finished project. At the moment, you can only administer one cluster at a time from your given local machine, since starcluster has no notion of a "session" and it can't distinguish between different clusters you've started up (you can
start multiple clusters, but then any starcluster commands that you type in your local terminal might get confused about which amazon machine instances you're referring to, so it has trouble administering them.) Also, there's no dynamic load balancing, so once you've started a cluster with a certain number of nodes, you're stuck with that number of computers while the cluster is running, even if you're only using a few of them or suddenly need more.
The developer of the project (
Justin Riley) says on his website that he's planning to add these features in the next release. Now, I'm not the creator or developer or maintainer of Starcluster, and I have no affiliation with Justin Riley or the project whatsoever, so I want to make it clear I don't speak for them in any way except as a satisfied user. I don't know what his commitment to his development plans are, either -- however, I hope he sticks to his timeline, as I think continuing the vigorous development of his project would be a real plus for the NumPy community. I'm hoping that if others in the NumPy community like his project and start using it, that will make add to the likelihood of continued development. (If anyone from the NumPy community is interesting in helping the
developer out, perhaps you should consider shooting him an email.)
Anyhow, I apologize for this long email, and hope it may be of use to somebody!
Dan