Whether there is a "better" solution depends on how much data there is and
what the workflow is.
If this is a one-shot experiment, then it makes sense to create a local FS
on the node and just copy the data in.
If you want to instantiate the experiment again and again over time with
the same dataset, then the best solution depends on the size of the dataset
and whether you want changes to the data to be persistent across
instantiations.
If the actual data it is less than around 10GB and you don't need persistence,
then loading the data every time is still reasonable using a CloudLab
"image-backed" dataset. Actually, you can use image-backed datasets if
you want to persist changes, it just takes extra time and effort to copy
all the data back out again (i.e., you need to remember to "snapshot"
the dataset before you destroy your experiment).
If you have a huge amount of data, then it would be better to have a
solution where you don't have to populate a node each time. For this you
can use the SAN-based ("short term" and "long term") datasets.
The big caveats here are:
- SAN-based datasets can only be used on the cluster where they are created
- you can currently only create them on Clemson and Apt clusters
- only one node per experiment can attach to the dataset
But with that in mind, you can create a persistent dataset via the portal:
https://www.cloudlab.us/create-dataset.php
and an example profile that uses a persistent dataset:
https://www.cloudlab.us/show-profile.php?uuid=387de1aa-6e13-11e5-96c6-38eaa71273fa
In fact, you can just instantiate the profile, as it will prompt you for
the dataset to use and where to mount it.
On Fri, Oct 16, 2015 at 10:48:07AM -0500, Brian Kroth wrote:
>
https://groups.google.com/d/msgid/cloudlab-users/20151016154807.GC14122%40cs.wisc.edu.