Thanks for sharing your experiences. I got caught flat footed with the
Squeeze update too. Soon after I had some work to finish quickly so I
fired up Segue only to discover things didn't work! I quickly patched
things up and checked in the changes. One thing I have been doing
really poorly is incrementing version numbers. Right now this requires
changing some build scripts so I end up not doing it. I need to make
that easier so I'll actually do it. I've also been sloppy/lazy as I
just relocated abroad and that's been rather time consuming.
CRAN packages: I have not tested this so I will set up some tests and
let you know what I find.
Instances per node: I have had best luck running the same number of
instances as the number of processors I have. So, for example, you
illustrate running c1.xlarge instances. Those have 4 virtual
processors so I would run 4 instances per node.
Spot Requests: The spot request features were added by the community.
I have never used them other than a simple test. Since spot requests
have to go through the Amazon bidding engine, I would guess they might
have delays related to that. Also I have heard they can get knocked
out from under you if the price goes up. I've not tested that or
validated if that's true.
Run time: Can you cobble up an example where you get longer run time
on EMR than you get on a local single thread? There are a bunch of
things that can cause this behavior. You clearly have dived into the
code and probably are not making the simple mistakes. If you could
send a dummy example I'd be happy to try to run it and sleuth through
the log files, etc. and see what I can find. It's possible to ssh into
the master node and see the Hadoop reporting interface to see which
nodes are doing what, etc. I do that sometimes when I need to really
understand WTF is going on.
I'm glad Segue is, at least sometimes, useful. It started as a bit of
a hack for me to speed up Monte Carlo sims. But it's grown into
something a bit more generally useful. Thanks, in a large part, to
early adopters, like you, willing to struggle along side me. Thanks
for that.
-J