use shared ivy cache on network drive? (v0.12.3, if that matters)

163 views
Skip to first unread message

Todd O'Bryan

unread,
Jun 11, 2013, 9:01:28 PM6/11/13
to simple-b...@googlegroups.com
I have a fairly weird setup. I teach high school comp sci and have a room full of fat clients (no hard drives--they download enough of the OS to boot and then pull in what they need as they need it) with all the users' home folders stored on a server.

When my senior-year class (roughly the equivalent of college sophomores in terms of CS background) uses SBT to create a project, every single student has to download all the dependencies to his/her ~/.ivy2/cache folder. That's a potential problem for two reasons: (1) we're wasting a lot of space to store 30 copies of the same dependencies on the same disk drive, and (2) our bandwidth is very spotty, so it can be very frustrating to get stuff installed.

What I'd like to do is set up a shared Ivy Cache on the network that would get mounted along with the user's home folder. I think this will work fine once things are downloaded, but I'm a little concerned at what might happen if multiple students all try to update at the same time, since they'd be fighting with each other over who gets to fetch the dependency and store it in the cache.

On the other hand, this should be no different than me running two different sbt update processes for different projects at the same time, because both of them would try to use my ~/.ivy2/cache folder and would have to not step on each other's toes.

So, does anyone know a reason why this wouldn't work or have concerns that I should check on?

Also, where is the best place to override the -ivy option for everybody on the server? Should I unjar the sbt.jar, make the changes, and then re-jar it, or is there a less obnoxious way to deal with it?

I have the summer to try to figure this out, but I'll have students working on projects some, so I will have time to experiment a little.

Thanks!
Todd

Robin Green

unread,
Jun 12, 2013, 3:40:06 AM6/12/13
to simple-b...@googlegroups.com
On Wednesday, 12 June 2013 02:01:28 UTC+1, Todd O'Bryan wrote:
I have a fairly weird setup. I teach high school comp sci and have a room full of fat clients (no hard drives--they download enough of the OS to boot and then pull in what they need as they need it) with all the users' home folders stored on a server.

When my senior-year class (roughly the equivalent of college sophomores in terms of CS background) uses SBT to create a project, every single student has to download all the dependencies to his/her ~/.ivy2/cache folder. That's a potential problem for two reasons: (1) we're wasting a lot of space to store 30 copies of the same dependencies on the same disk drive,

Is this really a problem? Disk space is very cheap these days. My ivy cache is currently 600MB, which works out as literally pennies, if you look at how much hard drives cost.
 
and (2) our bandwidth is very spotty, so it can be very frustrating to get stuff installed.

What I'd like to do is set up a shared Ivy Cache on the network that would get mounted along with the user's home folder. I think this will work fine once things are downloaded, but I'm a little concerned at what might happen if multiple students all try to update at the same time, since they'd be fighting with each other over who gets to fetch the dependency and store it in the cache.

I'd be concerned about that too. I suggest instead you set up a school Artifactory server, that would act a bit like a caching proxy server (though unlike a caching proxy it would actually store artifacts permanently). See my answer here: http://stackoverflow.com/a/10887323/495796

Mark Harrah

unread,
Jun 12, 2013, 9:49:10 AM6/12/13
to simple-b...@googlegroups.com
On Tue, 11 Jun 2013 18:01:28 -0700 (PDT)
Todd O'Bryan <toddo...@gmail.com> wrote:

> I have a fairly weird setup. I teach high school comp sci and have a room
> full of fat clients (no hard drives--they download enough of the OS to boot
> and then pull in what they need as they need it) with all the users' home
> folders stored on a server.
>
> When my senior-year class (roughly the equivalent of college sophomores in
> terms of CS background) uses SBT to create a project, every single student
> has to download all the dependencies to his/her ~/.ivy2/cache folder.
> That's a potential problem for two reasons: (1) we're wasting a lot of
> space to store 30 copies of the same dependencies on the same disk drive,
> and (2) our bandwidth is very spotty, so it can be very frustrating to get
> stuff installed.
>
> What I'd like to do is set up a shared Ivy Cache on the network that would
> get mounted along with the user's home folder. I think this will work fine
> once things are downloaded, but I'm a little concerned at what might happen
> if multiple students all try to update at the same time, since they'd be
> fighting with each other over who gets to fetch the dependency and store it
> in the cache.

One possibility is to have a shared filesystem repository. The cache won't copy the jars because it is a local repository.

> On the other hand, this should be no different than me running two
> different sbt update processes for different projects at the same time,
> because both of them would try to use my ~/.ivy2/cache folder and would
> have to not step on each other's toes.
>
> So, does anyone know a reason why this wouldn't work or have concerns that
> I should check on?

One reason it might not work as well as locally is that file locking doesn't always work on network drives.

> Also, where is the best place to override the -ivy option for everybody on
> the server? Should I unjar the sbt.jar, make the changes, and then re-jar
> it, or is there a less obnoxious way to deal with it?

The startup script can change the sbt.boot.properties that is used if you prefer. Rejarring might not be a bad solution though.

-Mark

> I have the summer to try to figure this out, but I'll have students working
> on projects some, so I will have time to experiment a little.
>
> Thanks!
> Todd
>
> --
> You received this message because you are subscribed to the Google Groups "simple-build-tool" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to simple-build-t...@googlegroups.com.
> To post to this group, send email to simple-b...@googlegroups.com.
> Visit this group at http://groups.google.com/group/simple-build-tool?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Todd O'Bryan

unread,
Jun 13, 2013, 5:48:39 PM6/13/13
to simple-build-tool
Thank you both for the info. We just tried updating about 10 computers
today, and the network in the room ground to a halt.

One question about creating our own local repo...Is there a place to
tell SBT at school to use that repo that is outside the project, so
that when students are working at home, it won't try to connect to a
repo that's not there?

As for cost of storage, I've been putting the home folders on a RAID
array of Velociraptors, since the disk I/O is one of the bottlenecks
for the lab (and I can't afford SCSI disks). I have about 150 students
who use the lab, and have 250GB of disk space for home folders, so
when a students starts creeping up to a gig or so, I start to get
nervous.

Thanks again!
Todd

Ian Weisberger

unread,
Nov 27, 2013, 2:51:55 AM11/27/13
to simple-b...@googlegroups.com
If cost is a major concern, why don't you stick an inexpensive SATA disk in each of the machines, and then set up a parallel network filesystem like http://www.gluster.org/ across all of the machines.   That's going to get you orders of magnitude more throughput than a single raid box with a few raptors.

If thats not an option, you could look into running a de-duplicating FS on the NAS box.   BTRFS has decent de-duplication for linux.
Reply all
Reply to author
Forward
0 new messages