spot instances

31 views
Skip to first unread message

Ian Fiske

unread,
Aug 2, 2011, 9:39:05 AM8/2/11
to Segue for R
Hi all,

Segue looks awesome and I hope to use it in some applications, but I
have a quick question. Is there any way to specify to use spot
instances when creating a cluster with makeCluster()? I nearly always
use spot instances as they are much more cost effective.

Thank you,
Ian

James (JD) Long

unread,
Aug 2, 2011, 11:30:10 AM8/2/11
to seg...@googlegroups.com
Ian, I've not built in support for spot instances. Are spot instances
supported with EMR?

-j

Ian Fiske

unread,
Aug 2, 2011, 10:43:16 PM8/2/11
to Segue for R
Hi James,

Thanks for the quick reply. You're right -- it looks like EMR is not
yet available for spot instances. However, this is in public beta
(https://forums.aws.amazon.com/thread.jspa?threadID=43553), so
hopefully will be available soon. So I guess my request was a bit
premature.

Ian

Radek Maciaszek

unread,
Aug 30, 2011, 1:42:49 PM8/30/11
to seg...@googlegroups.com
Hello,

I've just started playing with Seque today and the question about spot instances also crossed my mind. The spot instances just get out of beta: http://aws.typepad.com/aws/2011/08/run-amazon-elastic-mapreduce-on-ec2-spot-instances.html

From my experience with the beta using spot-instances lowered the bill by about 50% so it would be great to have them in Segue. 

Thanks,
Radek

James (JD) Long

unread,
Aug 30, 2011, 2:23:30 PM8/30/11
to seg...@googlegroups.com
I think it's a great idea to get spot instances in Segue. I've not
researched the API call changes needed to make it work. It's certainly
on the roadmap. If you hack together a change in the Segue source to
get spot instances working, I'd be happy to roll it in!

Thanks,

-J

Radek

unread,
Aug 30, 2011, 5:19:02 PM8/30/11
to seg...@googlegroups.com
Can't promise anything but I will try to look at it :)

Thanks,
Radek

James (JD) Long

unread,
Aug 30, 2011, 5:24:04 PM8/30/11
to seg...@googlegroups.com
I figure the basic syntax has to pass the following:

availability zone
bid price
desired nodes
???

What I'd need to know in order to add this to Segue is simple what
additional info has to be added to the call. Also, does the workflow
change? Can jobs start on EMR then stop when all the nodes get shut
down because pricing went above the bid? Do they restart? The workflow
implications are not obvious to me as I've not read about how spot
instances are implemented in EMR.

Even if you don't write the code, at least doing the research for the
above items would save me a LOT of time.

Thanks for your interest in Segue!

-J

Radek

unread,
Oct 5, 2011, 5:31:54 AM10/5/11
to seg...@googlegroups.com
Hi,

I've just added changes required for using spot instances. Normally I would prepare a patch but since some extra jar libraries were needed to be added I decided to create a new build. If you replace the lib jars and overwrite the other files hopefully mercurial should pick up the changes for you. But I can do the patch if you prefer. Github has an option to send pulls to the main contributor but I am not aware of such a feature with google code? 

I tried to make the method/variable names consistent with existing code but feel free to ask me to do any changes if you think they are necessary. Here is the updated code:

And the working example:

library(segue)

myCluster <- createCluster(numInstances=10, 
        location = "us-east-1b",
masterInstanceType="c1.xlarge", 
slaveInstanceType="c1.xlarge",
masterBidPrice="0.68",
slaveBidPrice="0.68")

Where masterBidPrice is the price for the master server and slaveBidPrice is the price for a core (slave) server. If want to have for example a master on-demand instance and slave spot instances - in that case all you need to do is define the price only for slaveBidPrice. If you do not define any of the bid prices on-demand instances will be created. 

The downside of using spot instances is that if the current market price will be higher than your bidding price you may loose some of your machines. The spot instances seem quite popular these days so the waiting time before the cluster starts is somehow longer. The plus is the 50%+ savings on cost.

If necessary I can extend the API in future to use a combination of on-demand and spot instances. With very long and costly analysis it would give a better protection against possible price volatility of spot instances. I usually specify the price slightly higher than on-demand instances and so far I hardly ever had any issues. Note that even if you specify a higher price you always pay the current market price for a given availability zone. 

Best,
Radek

James Long

unread,
Oct 5, 2011, 7:57:23 AM10/5/11
to seg...@googlegroups.com
That's fantastic, Radek. I'll try to look at the code today and see the best way to integrate it into Segue. Thanks!

-J

Radek

unread,
Oct 12, 2011, 5:55:13 AM10/12/11
to seg...@googlegroups.com
Glad to help and thank you for releasing that package as open source program in the first place! Let me know if you will need any help with integrating that code. 

Best,
Radek

James Long

unread,
Oct 13, 2011, 5:16:53 PM10/13/11
to seg...@googlegroups.com
I'm running a little behind as I've been sick... but it's forthcoming!

-J

Radek

unread,
Nov 16, 2011, 8:33:07 AM11/16/11
to seg...@googlegroups.com
Hi J,

If it will make it easier for you I've just added my patch to the clone of Seque repository:

Best,
Radek

James Long

unread,
Nov 16, 2011, 9:40:25 AM11/16/11
to seg...@googlegroups.com
It does!

Thanks.

-J


Sent from my iPhone.

James Long

unread,
Nov 24, 2011, 5:30:05 PM11/24/11
to seg...@googlegroups.com
Radek,

Since I'm not so much into football, Thanksgiving gave me a chance to
figure out how to merge back your changes. Totally easy, once I
figured it out :) I had never done that before.

I'm going to do some testing and then push your changes out to the
Segue site. Your putting this in Google Code really made it easy for
me to absorb your changes. Thank you.

-J

Timothy Dalbey

unread,
Nov 24, 2011, 5:32:20 PM11/24/11
to seg...@googlegroups.com
Hehe! I am also doing some hadoop/R programming on the sofa while the
rest of my relatives are jeering at the boob tube.

Let's go team {variable_name}!

Radek Maciaszek

unread,
Nov 29, 2011, 6:11:36 AM11/29/11
to seg...@googlegroups.com
Hi James,

Great news. Let me know if you will need any more help with this, if everything works for you or if you have any suggestions to the code.

Best,
Radek
Reply all
Reply to author
Forward
0 new messages