Is it possible to add node to existing experiment

123 views
Skip to first unread message

peter.wa...@gmail.com

unread,
Apr 8, 2021, 12:00:58 PM4/8/21
to cloudlab-users

Hi Mike and Leigh,
     I have an experiment which has a node storing 20TB+ data and I need to run some computation on the data, what is the best way to do this?
My current approach is to create a new experiment with some computation nodes and mount the data with NFS, but I can only get 100MB/s bottlenecked on the network (and it uses a lot of bandwidth of the control network).

Mike Hibler

unread,
Apr 8, 2021, 12:23:32 PM4/8/21
to cloudla...@googlegroups.com
No, you definitely should not use NFS on the control net.

What is your name of your storage experiment?
> --
> You received this message because you are subscribed to the Google Groups
> "cloudlab-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to cloudlab-user...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/
> cloudlab-users/efceaf88-cc2e-4f2b-8c2d-3aaefa1e788an%40googlegroups.com.

Juncheng Yang

unread,
Apr 8, 2021, 12:25:41 PM4/8/21
to cloudla...@googlegroups.com
Hi Mike,
The experiment name is a1a1a11a-QV94296
> To view this discussion on the web visit https://groups.google.com/d/msgid/cloudlab-users/20210408162329.GA15174%40flux.utah.edu.

Mike Hibler

unread,
Apr 8, 2021, 1:10:07 PM4/8/21
to cloudla...@googlegroups.com
You are going to have to explain a bit better what it is you are trying
to do. What you have now is probably not what you want.

Currently you have allocated a machine that has 270TB of local storage but
you are not using any of it. Instead you have two remote iSCSI datasets and
a RAM disk.

Why are your compute nodes in a different experiment?
> To view this discussion on the web visit https://groups.google.com/d/msgid/cloudlab-users/E561A411-C546-4E0E-B594-C7FE46D22DBF%40gmail.com.

Juncheng Yang

unread,
Apr 8, 2021, 1:17:20 PM4/8/21
to cloudla...@googlegroups.com
Thank you, Mike!

We are processing a large dataset (20-50TB), to avoid transferring from campus repeated (I only got 20MB/s when I transferred the data to Cloudlab and it took me a lot of time), I reserved the storage node to temporarily store the data.

And I do not need the computation nodes all the time (sometimes I have to debug and the nodes will be idle), so I do not allocate the computation nodes in the same experiment (but rather allocating them when I need). I have been trying to use the storage node to do local processing, but some of the computation requires more than 100GB DRAM and cannot be done on the storage node.

Any suggestion on how to better do this?
> To view this discussion on the web visit https://groups.google.com/d/msgid/cloudlab-users/20210408171004.GB15174%40flux.utah.edu.

Mike Hibler

unread,
Apr 8, 2021, 1:29:18 PM4/8/21
to cloudla...@googlegroups.com
My mistake, you are using the local disks. I come from a BSD background,
not Linux, and "md" means memory disk to me! :-) You are really taking your
chances though with a 44 disk RAID0! If you do this again, I would suggest
doing a RAID10 unless you need the entire capacity.

There are a couple of possible alternatives that will require some background
work on our part. Stay tuned...
> To view this discussion on the web visit https://groups.google.com/d/msgid/cloudlab-users/D2A60482-C795-4CAB-B39F-51F375F4A35F%40gmail.com.

Juncheng Yang

unread,
Apr 8, 2021, 1:33:34 PM4/8/21
to cloudla...@googlegroups.com
Oh, changing to RAID10 is a good suggestion! I almost forgot the probability of disk error/failure! :)
Thank you, Mike!
> To view this discussion on the web visit https://groups.google.com/d/msgid/cloudlab-users/20210408172916.GC15174%40flux.utah.edu.

Mike Hibler

unread,
Apr 8, 2021, 2:54:02 PM4/8/21
to cloudla...@googlegroups.com
Okay, here is what you can do. You had an otherwise unused LAN in your
storage server experiment so we have converted that into a so-called
"shared vlan". This means you can attach to the VLAN across experiments
and it will use the 10Gb experiment fabric rather than the 1Gb control
network fabric.

The only thing you need to do on your current server is to change
/etc/exports so that you export line reads:

/disk 10.10.1.0/24(rw,sync,no_subtree_check)

This will exports your filesystem ONLY on the experiment network and
not over the control network.

In your "compute" experiment profile, you will need to join all your
nodes in a lan and then do:

lan.connectSharedVlan("a1a1a11a-nfs")

to add in the storage server. Since the storage server has IP 10.10.1.1,
you should explicitly assign IP addresses in that subnet to all the
interfaces in your compute experiment (you can do this in the profile).
Then you will need to modify your client-side NFS setup so that it uses
the server's 10.10.1.1 address and NOT the address of the node on the
control network.

Let us know if you have problems.
> To view this discussion on the web visit https://groups.google.com/d/msgid/cloudlab-users/CA43F029-0752-4963-8A5E-C9DEFB2ED6AE%40gmail.com.

Juncheng Yang

unread,
Apr 8, 2021, 3:07:47 PM4/8/21
to cloudla...@googlegroups.com
Awesome, Mike! Thank you very much!
I will have a try when there are available servers (currently there is only one dss7500 available :( )
> To view this discussion on the web visit https://groups.google.com/d/msgid/cloudlab-users/20210408185359.GD15174%40flux.utah.edu.

Juncheng Yang

unread,
Apr 9, 2021, 4:31:36 PM4/9/21
to cloudla...@googlegroups.com
Hi Mike,
I think there is some bug in my profile and I keep getting "Not enough free nodes because of policy restrictions, or existing resource reservations to other projects”, but I don’t know where I get wrong. Can you help me? Thank you!


The profile I am using is compute and the experiment is a1a1a11a-QV95944
> To view this discussion on the web visit https://groups.google.com/d/msgid/cloudlab-users/20210408185359.GD15174%40flux.utah.edu.

Leigh Stoller

unread,
Apr 9, 2021, 5:10:16 PM4/9/21
to cloudla...@googlegroups.com
at 1:31 PM, Juncheng Yang <peter.wa...@gmail.com> wrote:

> Hi Mike,
> I think there is some bug in my profile and I keep getting "Not enough free nodes because of policy restrictions, or existing resource reservations to other projects”, but I don’t know where I get wrong. Can you help me? Thank you!

Hi. Just like the message says, there are not enough free nodes at Clemson.
Click on the Experiments menu, then on Resource Availability.

At the time you tried, there was one node available, your profile wants two.
You can make a reservation for later?

Leigh

Juncheng Yang

unread,
Apr 9, 2021, 6:24:07 PM4/9/21
to cloudla...@googlegroups.com
Oh, I didn’t know I requested two nodes… (I changed the profile recently)
It works perfectly now. Thank you, Leigh!
> --
> You received this message because you are subscribed to the Google Groups "cloudlab-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to cloudlab-user...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/cloudlab-users/5BD95575-0F77-411A-9344-35DCD3B4679E%40gmail.com.

Reply all
Reply to author
Forward
0 new messages