Creating a re-usable profile for a multi-node Apache Spark Cluster

37 views
Skip to first unread message

dh...@umn.edu

unread,
Apr 7, 2021, 2:52:24 AM4/7/21
to cloudlab-users
Hello

I am a new user to Cloudlab. I am looking for a simple way to re-run experiments without having to install software, packages etc. for multi-node cluster experiments. I read through the documentation for creating profiles (which takes disk snapshots) and storing all the software and data etc. in any directory (such as /local/) other than the user home directory. But that seems to work for single node experiments. How to do the same for multi-node cluster experiments? For example, I am looking to run experiments for Apache Spark (along with HDFS which requires Hadoop setup). Setting up a multi-node cluster (one master and multiple slaves) for Apache Spark is very very time taking and I would like to store its setup in some sort of profile so that I can just use it whenever I need to run experiments.

Can some one please help in this regard? This is becoming a major bottleneck in my research work. Any help will be greatly appreciated.

Thanks
Dhruv

dh...@umn.edu

unread,
Apr 8, 2021, 12:45:02 PM4/8/21
to cloudlab-users
Any pointers on this one?
Reply all
Reply to author
Forward
0 new messages