Re: Segue issue - createCluster() doesn't complete, Amazon bills me!

107 views
Skip to first unread message

James Long

unread,
Apr 24, 2014, 10:25:27 AM4/24/14
to Nan Pond, seg...@googlegroups.com
Nan, it sounds like something is going wrong in the boot scripts. Not your fault. Likely the result of either the latest R version not loading properly or something changing on AWS. You're correct that if the machines load AWS will likely charge you a bit of usage. There's no way around this, as far as I know. I always do my testing with medium sized instances and only 1 or 2 instances. 

I've not been actively using Segue for over a year now and am considering retiring the project. Sorry that I can't offer any quick fixes. 

-JD




On Thu, Apr 24, 2014 at 11:03 AM, Nan Pond <nanc...@gmail.com> wrote:
Hi JD-
I spent a while yesterday trying to run segue on my Macbook Pro.

Running myCluster <- createCluster(numInstances=2)
 would display a few STARTING messages, and then proceed to shut down again.

Ordinarily, I would just move on and try another approach for my computation, and chalk this up to a package in dev being used by someone who doesn't understand exactly what she's doing.

However, I am fairly certain that Amazon AWS is going to charge me for all the instances it started and then closed without me getting to use them, since they're appearing on my usage report.  Since there's $$ at stake here, I don't really want to continue playing with the createCluster() function and end up paying for every failed attempt to run that line of code, but I feel like now I'm (marginally) financially invested in trying to use your package. 

Is this a typical problem?  What can I do to troubleshoot without billing myself a few cents every time I try?

Thanks,
Nan Pond

--

Nan Pond, PhD

Postdoctoral Research Scientist
Michigan Technological University


Patrick Costello

unread,
Jul 17, 2014, 4:23:32 AM7/17/14
to seg...@googlegroups.com, nanc...@gmail.com
Hi JD,
I've just been trying out Segue as am looking for a simple way to run parallel r jobs in the cloud and it looks like a great solution.
I'm running into a similar problem as Nan, where the cluster starts but then bootstrapping fails and everything exits.
From your reply it seems like you don't have a lot of time to look into issues, so I was wondering if you knew of any other solutions similar to Segue for running parallel R jobs in the cloud?
If not, I'll try forking the code and fixing the issues
Cheers
Patrick

James Long

unread,
Jul 17, 2014, 12:34:20 PM7/17/14
to seg...@googlegroups.com, Nan Pond
Patrick,

Yep, it's impossible for me to get enough time to manage Segue and
keep it working these days.

I've had good luck using Starcluster, although mostly with Python. I
see evidence that others are using Starcluster as a backend for R:
http://stackoverflow.com/questions/14636950/r-and-snow-on-amazon-ec2-using-starcluster

The 'killer feature' of Segue, in my not at all humble opinion :) is
how simple it was to configure and get up and running. Everything else
I see tends to have more of a startup time cost.

Good luck and if you discover a workable solution, please let me and
this list know. I need to start redirecting folks to something other
than Segue.

Cheers,

James
> --
> You received this message because you are subscribed to the Google Groups
> "Segue for R" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to segue-r+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Patrick Costello

unread,
Jul 23, 2014, 6:58:11 PM7/23/14
to seg...@googlegroups.com, nanc...@gmail.com
Thanks James, will check out Starcluster as I use python a lot as well, maybe some combo with Rpy will work.
And yeah, the ease of getting Segue up and running was massively attractive, particularly for research people I know who want to spend minimal time learning IT and maximum time looking at results.
Keep you posted on anything promising I find

Patrick

James Long

unread,
Jul 26, 2014, 11:48:34 AM7/26/14
to seg...@googlegroups.com
Thanks Patrick.

-J

Zach

unread,
Feb 5, 2015, 11:02:00 AM2/5/15
to seg...@googlegroups.com, nanc...@gmail.com
Would you mind if I copied the code over to a github repo (keeping the MIT license of course) and started working on segue2?

I have a couple of ideas I want to try out:

http://www.omegahat.org/RAmazonS3/ for reading/writing to S3
RProtoBuf::serialize_pb for serializing data

These may or may not be good ideas. =D

James Long

unread,
Feb 5, 2015, 11:10:55 AM2/5/15
to seg...@googlegroups.com, nanc...@gmail.com
Please do! Hack away! 

-J


Sent from my iPhone. 

Zach

unread,
Feb 6, 2015, 10:32:31 AM2/6/15
to seg...@googlegroups.com, nanc...@gmail.com
I was able to convert the mercurial repo to git using hg-git:

So far both ideas haven't worked out: 
protocol buffers doesn't support serializing all R objects, and the serialized objects seem to be larger than caTools::base64encode(serialize(...))
RAmazonS3 doesn't support sending commands to EMR.  It seems the java api is really the way to go here.  I updated to aws-java-sdk-1.9.17 which is 67 MB.  I need to see if there's a way to strip everything out of it but the S3 and EMR tools.

So far, I've successfully setup travis-ci and coveralls for the repo, but haven't started writing any unit tests.  I don't think we'll be able to test all the amazon based functions, (I don't want to give travis my amazon credentials, hahah) but we should be able to at least test some of the file input-output functions to make sure serialization and de-serialization works correctly.

James Long

unread,
Feb 6, 2015, 10:42:17 AM2/6/15
to seg...@googlegroups.com, nanc...@gmail.com
Wow, great progress!

There have been a LOT of ERM API changes since Segue was written. 

-J


Sent from my iPhone. 

Benjamin Harvey

unread,
Feb 25, 2015, 4:38:12 PM2/25/15
to seg...@googlegroups.com, nanc...@gmail.com
Zach,

Have you been able to produce a stable version of Segue2 that utilizes the more recent versions of R? 
Reply all
Reply to author
Forward
0 new messages