Re: Compute Canada CVMFS and software stack

175 views
Skip to first unread message

rpta...@computecanada.ca

unread,
Jul 16, 2018, 7:31:45 PM7/16/18
to HSF Build and Packaging Tools Discussion Forum
Hi Ben, Laurent, Marco and Guilherme, (I think I got everyone?)

Thanks for getting in touch, it was good to chat with you last week at CHEP.
I am posting this to the Google Group that you mentioned.

For some background information, here is a presentation from FOSDEM 2018 earlier this year:
"Combining CVMFS, Nix, Lmod, and EasyBuild at Compute Canada"
https://fosdem.org/2018/schedule/event/computecanada/

And here is a presentation I gave at the CernVM Workshop around the same time (with some adapted
material from the FOSDEM presentation):
https://indico.cern.ch/event/608592/contributions/2858287/

We do not yet have public-facing documentation for usage of the Compute Canada software stack
outside of Compute Canada. But if you are interested in accessing the CVMFS repositories for
testing purposes/curiosity, you can simply do the following on a CVMFS client:

yum install https://package.computecanada.ca/yum/cc-cvmfs-public/Packages/computecanada-release-1.0
-1.noarch.rpm
yum install cvmfs-config-computecanada

This will set up the CC yum repository, which provides the cvmfs-config-computecanada RPM, which
configures CVMFS clients with the CC CVMFS config repository - thereby making all other CC CVMFS
repositories available as well, while at the same time ensuring that clients periodically get
cvmfs-config-computecanada updates via yum. (Be careful if you are already using a different CVMFS
config repo, since a CVMFS client can only use one config repo).

Then to set up the modules:
source /cvmfs/soft.computecanada.ca/config/profile/bash.sh

Here is the list of installed software modules:
https://docs.computecanada.ca/wiki/Available_software#List_of_modules

You can take a look at /cvmfs/test.dev.computecanada.ca/README for an example of using variant
symlinks in CVMFS to provide a sort of virtual software relocation capability. It is like an alias
that can be controlled on the level of individual client nodes.

Let me know if there was anything else we discussed that you would like further information on, or
any other comments or suggestions.

Thanks,
-rt

Chris Burr

unread,
Jul 17, 2018, 4:07:00 AM7/17/18
to rpta...@computecanada.ca, hsf-pack...@googlegroups.com
Hi Ryan,

Unfortunately I didn't manage to catch you during CHEP but I'm Chris and I've done a lot of the
testing of Nix within LHCb.

Thanks for the links, this is very interesting! I have a few questions about your setup:

 ● For building packages with your custom store path do you use an instance of Hydra? Or just
 allow Nix to build everything at install time?

  What is the motivation for using easybuild?

It looks like it allows you to configure the multiple versions and architectures while
automatically generating the lmod files so users can configure the environments?

If so, did you consider doing this within Nix as well? I was expecting to use overlays[1] for
this, though your setup might predate them.

  Have you considered building the base packages that you get using Nix against multiple
 architectures? (I'm not sure if this actually has any potential benefits)

Cheers,

Chris


--
You received this message because you are subscribed to the Google Groups "HSF Build and Packaging Tools Discussion Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hsf-packaging-...@googlegroups.com.
To post to this group, send email to hsf-pack...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hsf-packaging-wg/a1b4b352-ad3b-4ca3-a215-3fa13e52f783%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

rpta...@computecanada.ca

unread,
Jul 17, 2018, 3:25:56 PM7/17/18
to HSF Build and Packaging Tools Discussion Forum
Hi Chris,

No, we do not use Hydra. The software is built once and installed onto CVMFS for distribution.

The software specialists find that Easybuild is more suitable for building large complex scientific applications, whereas Nix is more suitable for small self-contained packages with few dependencies. We also use the Nix OS environment profile (currently 16.09).

Aside from doing build automation, EasyBuild also does automated testing, logs the building process, commits the build recipe to git for reproducibility, automatically updates documentation when a new package is built, and automatically generates LMOD modules for each package.

Perhaps some of the things that we are doing with EasyBuild could also be done with Nix, but nevertheless we are using a composite approach, taking several components and combining the best features of each of them for the situations in which each is most suitable.

I work on the CVMFS side so I am not as familiar with the Nix + Easybuild parts, but I asked an expert from the software team about Nix overlays.
The short response I got was that we do not need Nix overlays and/or they do not fit into our approach.


Slide 13 of my CernVM workshop presentation describes the different scenarios dictating whether Nix, Easybuild, or both, are used:

> It looks like it allows you to configure the multiple versions and architectures while
> automatically generating the lmod files so users can configure the environments?

Yes. On each CC cluster an environment variable is set which can be used to load, for example,
software built with AVX vs AVX2 vs AVX-512 according to the CPUs on that cluster, or in principle to load Infiniband vs Omnipath.

> Have you considered building the base packages that you get using Nix against multiple
> architectures? (I'm not sure if this actually has any potential benefits)

All the CC clusters have x86 processors, so in this context different "architectures" effectively means different CPU generations supporting different levels of AVX extensions.
Some extra work would be needed to support e.g. ARM, but the Nix+Easybuild framework we have in place would take care of a large part of it.

Generally speaking, any packages built with Nix are not considered performance-critical (e.g. OS stuff like awk, grep, etc. and miscellaneous utilities like texinfo
 etc.) However if there is a package built with Nix that is found to have some performance impact, we could compile a new version with AVX-512 extensions or whatever to realize the performance gain.
This would constitute moving from the yellow layer to the green layer on slide 13.


Thanks,
-rt
To post to this group, send email to hsf-pac...@googlegroups.com.

rpta...@computecanada.ca

unread,
Jul 17, 2018, 3:46:27 PM7/17/18
to HSF Build and Packaging Tools Discussion Forum
Relaying some more responses I got from the software experts. (I will see if any of them want to join this group.)

"nixpkgs overlays are interesting but they basically allow us to use upstream nixpkgs with our own changes overlaid instead of the current system of maintaining a git fork of nixpkgs."

So as I understand it we are already doing something equivalent.

Also:
"EasyBuild is vastly superior to Nix when it comes to building scientific packages. "

"Nix is beautifully consistent as long as you stick with only Nix. As soon as you build things outside of Nix (install and compile Python, Perl or R packages for example), it is going to link to the Nix store in a way that Nix is unaware of, and the next Nix garbage collection is going to break all those things.  Going this way is a catastrophe waiting to happen on a HPC cluster environment with end users. We had to backtrack a few packages that were installed in Nix (Python, Perl, Qt) and compile them with EasyBuild rather than Nix because we ran into such problems."

"I am even considering replacing Nix by Gentoo prefix because it is simpler (and security updates in low-level libraries apply straight away instead of needing to recompile everything in Nix that depends on it -- though we are guarded against most security issues since we are strictly non-suid userland only, so can compile and run the same insecure stuff a regular user can)."

Thanks,
-rt

Maxime Boissonneault

unread,
Jul 17, 2018, 3:49:20 PM7/17/18
to HSF Build and Packaging Tools Discussion Forum
Hi all,
Ryan asked me to comment, specifically on the question about using EasyBuild vs Nix. 

Nix is beautifully consistent as long as you stick with only Nix. As soon as you build things outside of Nix (install and compile Python, Perl or R packages for example), it is going to link to the Nix store in a way that Nix is unaware of, and the next Nix garbage collection is going to break all those things.  Going this way is a catastrophe waiting to happen on a HPC cluster environment with end users. We had to backtrack a few packages that were installed in Nix (Python, Perl, Qt) and compile them with EasyBuild rather than Nix because we ran into such problems.

EasyBuild is vastly superior to Nix when it comes to building scientific software, both because it already has way more supported *scientific* packages than Nix, and because it interacts in a much more natural way with other builds systems, and it generates modules which users are already used to use on most cluster. 
Nix was never designed for running performance critical scientific applications, it was designed as an OS, and it does a great job for that, which is why we are using it for that layer. This also means that it does not need to be optimized for a specific architecture (most HPC centers use binary RPM packages for the OS layer anyway). We let EasyBuild take care of the optimization. 

Brett Viren

unread,
Jul 17, 2018, 5:10:16 PM7/17/18
to rpta...@computecanada.ca, HSF Build and Packaging Tools Discussion Forum
rpta...@computecanada.ca writes:

> "Nix is beautifully consistent as long as you stick with only Nix. As
> soon as you build things outside of Nix (install and compile Python,
> Perl or R packages for example), it is going to link to the Nix store
> in a way that Nix is unaware of, and the next Nix garbage collection
> is going to break all those things.

Nix GC is evoked manually. If I build against stuff against /usr/local
and then do "rm -r /usr/local", same breakage will happen.

> Going this way is a catastrophe
> waiting to happen on a HPC cluster environment with end users. We had
> to backtrack a few packages that were installed in Nix (Python, Perl,
> Qt) and compile them with EasyBuild rather than Nix because we ran
> into such problems."

If the high level packages are built with Nix then I don't see how the
Nix GC would remove their dependencies while still keeping them around
broken.

OTOH, if you are saying that high level software was built outside of
Nix and against Nix packages and then a GC was done, then yeah, that
outside software will be left high and dry. This is a feature of any
build system (eg, also Spack) that does not follow a mutate-in-place
installation policy.

I use Nix (with a few overlays) and Spack (plus Spack views) as two ways
to provide external dependencies for developing one particular project.
That builds fully outside of these systems. Both work well for
development. For releases, I provide Spack (but not yet released Nix)
build recipes.

I think valid complaints against Nix are that:

- It's config language has a really steep learning curve.
- It's documentation also a learning curve!
- One Need's a /nix mount or must forego sharing "standard" binaries.

I think Spack does better on these points at the cost of not having as
well developed binary sharing ecosystem. Installing Nix binaries
typically goes more smoothly than building with Spack from source.
Developing Nix build recipes requires far far more front-loaded learning
than writing Spack package.py files.

-Brett.



signature.asc

Maxime Boissonneault

unread,
Jul 17, 2018, 6:10:04 PM7/17/18
to Brett Viren, rpta...@computecanada.ca, HSF Build and Packaging Tools Discussion Forum
On 2018-07-17 5:10 PM, Brett Viren wrote:
> rpta...@computecanada.ca writes:
>
>> "Nix is beautifully consistent as long as you stick with only Nix. As
>> soon as you build things outside of Nix (install and compile Python,
>> Perl or R packages for example), it is going to link to the Nix store
>> in a way that Nix is unaware of, and the next Nix garbage collection
>> is going to break all those things.
> Nix GC is evoked manually. If I build against stuff against /usr/local
> and then do "rm -r /usr/local", same breakage will happen.
Yes, except that you don't have a gazillion different versions of
/usr/local (like you would with Nix), and that GC is not supposed to
break anything that is used (that's the definition of GC, it deletes
*garbage*). It works correctly for anything that is installed with Nix.
The problem is that Nix has no way to know what's garbage and what is
not if *anything* is built outside of Nix (and users do that all the time).

So, basically, this means that if :
- you expose Nix to users
- you let the Nix compiler link against the nix store

then you cannot run garbage collection *ever* without risking breaking
some of your users jobs. That means you also cannot apply security fixes
or patches to fix something that's broken.

To prevent that, the way we work with our stack is that we expose a
single Nix profile to our users, and we modified the linker wrapper so
that any linking that is done outside of Nix itself is done through that
one profile, not through the nix store.

We are currently sitting at 53k different directories in our nix store
on our build-node, many of which are old versions of packages that have
since then been upgraded. On our deployed stack, we only have 5k. So,
more than 90% of the stuff we ever built is Nix is no longer used.

>> Going this way is a catastrophe
>> waiting to happen on a HPC cluster environment with end users. We had
>> to backtrack a few packages that were installed in Nix (Python, Perl,
>> Qt) and compile them with EasyBuild rather than Nix because we ran
>> into such problems."
> If the high level packages are built with Nix then I don't see how the
> Nix GC would remove their dependencies while still keeping them around
> broken.
virtualenv copies "python" from Nix into the user's directory. It is
hard-coded to look for paths in the nix store. The user then installs
python packages with pip. Even if the user is aware of Nix, he could
install a virtualenv with Python 2, and then install Python 3 in its own
profile, or install a new version of Python 2. Nix has no idea about the
external virtual environment. If garbage collection is run, it will
delete the old seemingly unused python with its libraries... and break
the virtual environment.

Perl will do the same too. If you call perl or cpan to install packages
in your home folder and they were install with Nix, the installed
package will be full of paths to the nix store, which if GC'ed will then
break the installed package.

Another problem that we ran into is that Nix tends to split many
packages (such as Qt) into multiple installation directories. Again, Nix
itself is self-coherent, but the rest of the world is typically not
Nix-aware and will likely make assumption such as "all Qt libraries are
installed in the same directory as qmake".


Maxime

Guilherme Amadio

unread,
Jul 18, 2018, 2:37:17 AM7/18/18
to rpta...@computecanada.ca, HSF Build and Packaging Tools Discussion Forum, Ben.M...@warwick.ac.uk, Marco.C...@cern.ch, laurent.a...@gmail.com
Hi Ryan,

(CCing others by hand as I don't know who is in hsf-packaging list.)

Thank you for the interesting thread and the discussions at CHEP.

On Tue, Jul 17, 2018 at 12:46:27PM -0700, rpta...@computecanada.ca wrote:
> Relaying some more responses I got from the software experts. (I will see
> if any of them want to join this group.)
>
> "nixpkgs overlays are interesting but they basically allow us to use
> upstream nixpkgs with our own changes overlaid instead of the current
> system of maintaining a git fork of nixpkgs."
>
> So as I understand it we are already doing something equivalent.

I think this is a common thing among package managers. With Portage,
there is a tool (https://wiki.gentoo.org/wiki/Layman) to manage
overlays, but I usually stick with the main tree.

> Also:
> "EasyBuild is vastly superior to Nix when it comes to building scientific
> packages. "

I never used EasyBuild, could someone that knows it explain some of the
differences with other package managers?

> "Nix is beautifully consistent as long as you stick with only Nix. As soon
> as you build things outside of Nix (install and compile Python, Perl or R
> packages for example), it is going to link to the Nix store in a way that
> Nix is unaware of, and the next Nix garbage collection is going to break
> all those things. Going this way is a catastrophe waiting to happen on a
> HPC cluster environment with end users. We had to backtrack a few packages
> that were installed in Nix (Python, Perl, Qt) and compile them with
> EasyBuild rather than Nix because we ran into such problems."

I think that if you install software that the package manager is unaware
of, it will be problematic no matter which package manager is used. That
said, if with nix you link directly to the hashed locations, then it does
look like the situation could be worse. In Gentoo prefix things are
updated in place, so unless the API or ABI changes, nothing will break.

> "I am even considering replacing Nix by Gentoo prefix because it is simpler
> (and security updates in low-level libraries apply straight away instead of
> needing to recompile everything in Nix that depends on it -- though we are
> guarded against most security issues since we are strictly non-suid
> userland only, so can compile and run the same insecure stuff a regular
> user can)."

I'd be interested in helping who made this comment switch :-)

In general, for experiments, the way to go is to build a stack, then
declare it frozen (except for security updates, like openssl, etc), so
that the things that are external to the stack, if any, do not need to
be recompiled. Also, for reproducibility it's important that versions of
things don't change as well.

For general use in development, it's ok to keep a rolling release for
the next stack that will go into production until everyone is satisfied
and then it can be frozen at that point.

Cheers,
-Guilherme

Maxime Boissonneault

unread,
Jul 18, 2018, 7:41:06 AM7/18/18
to Guilherme Amadio, rpta...@computecanada.ca, HSF Build and Packaging Tools Discussion Forum, Ben.M...@warwick.ac.uk, Marco.C...@cern.ch, laurent.a...@gmail.com, Bart Oldeman
On 2018-07-18 2:37 AM, Guilherme Amadio wrote:
> Hi Ryan,
>
> (CCing others by hand as I don't know who is in hsf-packaging list.)
>
> Thank you for the interesting thread and the discussions at CHEP.
>
> On Tue, Jul 17, 2018 at 12:46:27PM -0700, rpta...@computecanada.ca wrote:
>> Relaying some more responses I got from the software experts. (I will see
>> if any of them want to join this group.)
>>
>> "nixpkgs overlays are interesting but they basically allow us to use
>> upstream nixpkgs with our own changes overlaid instead of the current
>> system of maintaining a git fork of nixpkgs."
>>
>> So as I understand it we are already doing something equivalent.
> I think this is a common thing among package managers. With Portage,
> there is a tool (https://wiki.gentoo.org/wiki/Layman) to manage
> overlays, but I usually stick with the main tree.
>
>> Also:
>> "EasyBuild is vastly superior to Nix when it comes to building scientific
>> packages. "
> I never used EasyBuild, could someone that knows it explain some of the
> differences with other package managers?
Here is a presentation that compared it with many other package managers :
https://fosdem.org/2018/schedule/event/installing_software_for_scientists/attachments/slides/2437/export/events/attachments/installing_software_for_scientists/slides/2437/20180204_installing_software_for_scientists.pdf

To me, the main selling points are
1) Most *scientific* softwares are already supported.  Nix impresses by
the number of packages, but a very small percentage of them is
scientific packages. Nix focuses on OS packages.
2) It generates modules  (and all our clusters in the past 20 years have
used modules, so users are accustomed to that).
3) Applications are optimized for the architecture by default
4) It uses a language that is already known by most people (python) and
so it is easy to approach
>
>> "Nix is beautifully consistent as long as you stick with only Nix. As soon
>> as you build things outside of Nix (install and compile Python, Perl or R
>> packages for example), it is going to link to the Nix store in a way that
>> Nix is unaware of, and the next Nix garbage collection is going to break
>> all those things. Going this way is a catastrophe waiting to happen on a
>> HPC cluster environment with end users. We had to backtrack a few packages
>> that were installed in Nix (Python, Perl, Qt) and compile them with
>> EasyBuild rather than Nix because we ran into such problems."
> I think that if you install software that the package manager is unaware
> of, it will be problematic no matter which package manager is used. That
> said, if with nix you link directly to the hashed locations, then it does
> look like the situation could be worse. In Gentoo prefix things are
> updated in place, so unless the API or ABI changes, nothing will break.
Precisely. Most package managers won't install in hashed locations and
then create a link farm. This is both the strength (in some cases) and
the weakness of Nix.
>> "I am even considering replacing Nix by Gentoo prefix because it is simpler
>> (and security updates in low-level libraries apply straight away instead of
>> needing to recompile everything in Nix that depends on it -- though we are
>> guarded against most security issues since we are strictly non-suid
>> userland only, so can compile and run the same insecure stuff a regular
>> user can)."
> I'd be interested in helping who made this comment switch :-)
That would be Bart Oldeman, our software specialist who developed the
core of our stack and who was dragged into all of the gory details of
these packages managers.

> In general, for experiments, the way to go is to build a stack, then
> declare it frozen (except for security updates, like openssl, etc), so
> that the things that are external to the stack, if any, do not need to
> be recompiled. Also, for reproducibility it's important that versions of
> things don't change as well.
Well, yes and no. For heavily developed scientific packages, that's
often (but not always) true, you don't want to change versions. Linux
packages (glibc, libssl, libX11, gcc, etc.) tend not to break anything
when updated between minor versions. Very rarely have I seen a update of
a CentOS package break anything that was depending on it. The fact that
Nix handles minor version upgrades exactly the same is if they were a
completely different package is what's causing problems.


--
---------------------------------
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Président - Comité de coordination du soutien à la recherche de Calcul Québec
Team lead - Research Support National Team, Compute Canada
Instructeur Software Carpentry
Ph. D. en physique

Reply all
Reply to author
Forward
0 new messages