examples using "otherBootstrapActions"?

50 views
Skip to first unread message

JC

unread,
Dec 26, 2012, 3:40:40 AM12/26/12
to seg...@googlegroups.com
Hi,

I am new to using segue, and I need to yum install more packages in my EMR linux clusters before running my R code. I would like to schedule the installation of these packages by using the 'otherBootstrapActions' argument in segue's createCluster() function. However, I am having trouble doing this without incurring a syntax error. Are there any working examples of how to use the otherBoostrapActions argument that I could look at? Thanks.

JC

JC

unread,
Dec 26, 2012, 3:45:38 AM12/26/12
to seg...@googlegroups.com
Just to clarify, the packages I need to install beforehand are NOT R packages, but rather some standard open source software (e.g. debian packages), like "gdal-bin".

JC

James Long

unread,
Dec 26, 2012, 10:10:01 AM12/26/12
to seg...@googlegroups.com
JC,  I don't have an example handy. But I contacted Q McCallum to see if he does. We'll whip something up either way. The important point is making sure you're passing it a list of lists. Do you have an example of what you tried?

-J

 

I don't have a simple example already cooked up. I've reached out to Q Mc
On Wed, Dec 26, 2012 at 8:45 AM, JC <jiehuaf...@gmail.com> wrote:
gdal

JC

unread,
Dec 26, 2012, 3:29:46 PM12/26/12
to seg...@googlegroups.com
Hi,

I tried
 

myCluster <- createCluster(numInstances=2,location="us-east-1b",copy.image = FALSE, otherBootstrapActions=list(intall.apt="apt-get install gdal-bin proj-bin libproj-dev libgdal1-dev", wgetrgdal="wget http://cran.r-project.org/web/packages/rgdal/rgdal_0.7-18.tar.gz", install_rgdal="R CMD INSTALL configure-args=with-proj-include=/usr/local/lib rgdal_0.7-18.tar.gz"))

I know it is wrong, but just do not know how to put the 3 other actions in.


Best,

James Long

unread,
Dec 28, 2012, 8:02:28 AM12/28/12
to seg...@googlegroups.com
JC,

I traded emails with Q and he gave me this good narrative and example
for how to use the otherBootstrapActions:


- let's say you've stored a shell script in S3, under
s3://beermuda/scripts/mySuperDuperSetup.sh, and you want to pass it a
couple of values on the commandline

- pass that S3 path to otherBootstrapActions:

createCluster(
... ,
otherBootstrapActions=list(
list(
s3file="s3://beermuda/scripts/mySuperDuperSetup.sh" ,
args=c( "firstScriptArg" , "secondScriptArg" , "andSoOn" )
)
)
)

- and, when the cluster launches, Amazon will make sure
mySuperDuperSetup.sh is run on every node.

Now, the next question is, "what goes in mySuperDuperSetup.sh?" One
answer would be, "a bunch of commands to install third-party
tools/libraries you'd want on the nodes." Say, prerequisite libraries
for packages listed in cranPackages. or even commands to install
packages that are _not_ on CRAN, such as in-house products.

JC

unread,
Dec 29, 2012, 3:26:55 PM12/29/12
to seg...@googlegroups.com
Hi,

Thank you so much! I will try.

I tried to revise bootstrapAction.sh file in the library folder, and installing packages is a pain, because of all the configuration problems. I guess it is just a general problem.

Best,
JC

James Long

unread,
Dec 31, 2012, 8:21:16 AM12/31/12
to seg...@googlegroups.com
You are correct that it's a tough problem.

One path that would be nice is if EMR supported the use of custom
AMIs. Then I could create a general purpose AMI with all the latest R
goodness and have Segue use that. And we could create an option for
users to fork that AMI and add packages that they like and have Segue
use their custom AMI. This would make all our lives a LOT easier.

-J


On Sat, Dec 29, 2012 at 4:26 PM, JC <jiehuaf...@gmail.com> wrote:
> guess
Reply all
Reply to author
Forward
0 new messages