Force symlinks

0 views
Skip to first unread message

Nate Coraor

unread,
Sep 1, 2016, 3:01:16 PM9/1/16
to conda - Public
Hi,

I'm using Conda in CVMFS and am presented with the problem of rapidly growing space usage in the CVMFS repository (filesystem). This is in part because CVMFS does not support cross-directory hardlinks and so installing packages into environments results in multiple copies of the same package contents.

CVMFS is an HTTP-based FUSE-implemented read-only filesystem designed for software distribution which stores the "master" copy of a repository in content-addressable storage (CAS) (i.e. hashed file chunks on a local filesystem) on a host called a stratum 0. Changes are propagated out from stratum 0 servers and ultimately to clients via HTTP, which mount it via FUSE. When you make changes to the repository on the stratum 0 (i.e. beginning a transaction), CVMFS first read-only mounts the CAS using the CVMFS FUSE client. It then uses AUFS to mount a writable filesystem that is unioned with the read-only mount. You make your changes and when done, publish the changes. The difference between the read-write set and the read-only set becomes a snapshot.

This AUFS filesystem supports cross-directory hardlinks just fine. But when the changes are published, cross-directory hardlinks are broken and become multiple copies of the same file.

Conda falls back to symlinks if hardlinks are impossible, but unfortunately, this doesn't work in the CVMFS case, since conda is able to create hardlinks at runtime and doesn't know they will be broken later.

So, is it possible to force conda to use symlinks? The only config option I can see related to soft/hardlinks is to disable the symlink fallback. I need it to prefer symlinks to hardlinks, not the other way around.

Thanks,
--nate

David S

unread,
Sep 1, 2016, 11:01:31 PM9/1/16
to conda - Public
I'd like this too and made a github issue for this recently, nice to now I'm not the only one, maybe add a comment here?
https://github.com/conda/conda/issues/330

Ian Stokes Rees

unread,
Sep 1, 2016, 11:43:34 PM9/1/16
to David S, conda - Public
Nate,

Not possible today, but I suspect it would be pretty easy to add a --softlink option that would force the use of softlinks.  If you have even a little Python experience I wouldn't be surprised if you could create a starting-point PR in an hour or two.

If that isn't possible, then please create a new GH issue to this effect.  You could just cut-and-paste your original ML post here:

https://github.com/conda/conda/issues/new

David,

I think you pasted in the wrong GH issue number.  I tried to search for something on this topic but couldn't find one that seemed relevant.  Could you confirm the issue number?

Regards,

Ian


On 9/1/16 11:01 PM, David S wrote:
I'd like this too and made a github issue for this recently, nice to now I'm not the only one, maybe add a comment here?
https://github.com/conda/conda/issues/330

On Thursday, September 1, 2016 at 12:01:16 PM UTC-7, Nate Coraor wrote:
Hi,

I'm using Conda in CVMFS and am presented with the problem of rapidly growing space usage in the CVMFS repository (filesystem). This is in part because CVMFS does not support cross-directory hardlinks and so installing packages into environments results in multiple copies of the same package contents.
...

David

unread,
Sep 1, 2016, 11:50:28 PM9/1/16
to Ian Stokes Rees, conda - Public
Hi Ian, sorry, I missed a 8, in my copy paste

Nate Coraor

unread,
Sep 2, 2016, 9:03:22 AM9/2/16
to conda - Public, ijst...@continuum.io
Thanks all, I'll follow the issue - Ian, if I can find some time today I'll see what I can come up with.

Thanks,
--nate
Reply all
Reply to author
Forward
0 new messages