Creating a re-usable profile for a multi-node Apache Spark Cluster

37 views

Skip to first unread message

dh...@umn.edu

unread,

Apr 7, 2021, 2:52:24 AM4/7/21

to cloudlab-users

Hello

I am a new user to Cloudlab. I am looking for a simple way to re-run experiments without having to install software, packages etc. for multi-node cluster experiments. I read through the documentation for creating profiles (which takes disk snapshots) and storing all the software and data etc. in any directory (such as /local/) other than the user home directory. But that seems to work for single node experiments. How to do the same for multi-node cluster experiments? For example, I am looking to run experiments for Apache Spark (along with HDFS which requires Hadoop setup). Setting up a multi-node cluster (one master and multiple slaves) for Apache Spark is very very time taking and I would like to store its setup in some sort of profile so that I can just use it whenever I need to run experiments.

Can some one please help in this regard? This is becoming a major bottleneck in my research work. Any help will be greatly appreciated.

Thanks

Dhruv

dh...@umn.edu

unread,

Apr 8, 2021, 12:45:02 PM4/8/21

to cloudlab-users

Any pointers on this one?

Reply all

Reply to author

Forward

0 new messages