A few newbie questions

649 views
Skip to first unread message

Hadrien Grasland

unread,
Jul 16, 2018, 10:00:41 AM7/16/18
to sp...@googlegroups.com

Hi everyone,

I recently started experimenting with Spack for my software packaging needs, and so far I like it a lot, but there are some things which I do not understand yet:

  1. Some packages, such as Eigen and Valgrind, have variants which have no obvious effect on the build besides adding dependencies. What does it mean?
  2. Some packages come with many variants enabled by default. Shouldn't minimal builds be the default, with users enabling extra variants as desired?
  3. The ability of `spack create` to automagically fill the package.py seems limited in the common case of a CMake-based package downloaded via git. Can it be helped?
  4. Is there a way to avoid repeating the "git=" parameter over and over again in package.py when all versions of a package are available as commits in a git repository?
  5. Sometimes, running "spack load" right after "spack install" fails, saying that a module does not exist. Re-loading setup-env.sh fixes the issue. What is going on?

Thanks in advance for your explainations! I would be happy to help fixing anything in this list which is more of a bug than a feature.

Cheers,
Hadrien

Hadrien Grasland

unread,
Jul 16, 2018, 10:06:29 AM7/16/18
to sp...@googlegroups.com

PS: Oh, and an extra one. Many useful software projects exist as a patch to a more established software project which has not been upstreamed yet. For example, Verrou is a CESTAC floating-point error checker which is packaged as a valgrind patch, and Templight is a C++ template bloat debugging tool which is packaged as a clang patch.

What is the expected way of packaging such projects? Should they be packaged as a variant of the upstream project, or as an no-op package which does nothing but depending on a patched version of the upstream project?

Elizabeth A. Fischer

unread,
Jul 16, 2018, 10:15:02 AM7/16/18
to Hadrien Grasland, Spack
On Mon, Jul 16, 2018 at 10:02 AM, Hadrien Grasland <gras...@lal.in2p3.fr> wrote:

Hi everyone,

I recently started experimenting with Spack for my software packaging needs, and so far I like it a lot, but there are some things which I do not understand yet:

  1. Some packages, such as Eigen and Valgrind, have variants which have no obvious effect on the build besides adding dependencies. What does it mean?
Some packages have "automagic" features, which turn on/off depending if particular dependencies are available at build time.  This is not considered best practice, but is widespread.
  1. Some packages come with many variants enabled by default. Shouldn't minimal builds be the default, with users enabling extra variants as desired?
It all depends on the situation; we try to set up defaults so the package is useful for a wide range of users without introducing unnecessary dependencies.  Any package could be converted to one that has no variants turned on by default, by negating the sense of logic for variants currently turned on.  But this would be a pointless exercise that introduces a lot of evil "negative logic"  (i.e. a variant +dont-eat instead of a variant +eat).

 
  1. The ability of `spack create` to automagically fill the package.py seems limited in the common case of a CMake-based package downloaded via git. Can it be helped?
@adamjsteward has put a lot of work into automagic writing of package.py files.  But it is limited because in many cases, the required information is only written in English; and we don't have built AI bots yet that can read lengthy description pages of "Installation Instructions" and convert them to Python.

 
  1. Is there a way to avoid repeating the "git=" parameter over and over again in package.py when all versions of a package are available as commits in a git repository?

In short, no.

The standard way is to download tarballs, with a provided checksum.  The checksum ensures files haven't been tampered with since the author wrote the Spack package.  Checksums are not in effect when downloading from a symbolic git branch or tag; however, they ARE when downloading from a git hash.

GitHub and others provide a convenient way to download any Git branch, tag or commit as a tarball (which is checksummable).
 
  1. Sometimes, running "spack load" right after "spack install" fails, saying that a module does not exist. Re-loading setup-env.sh fixes the issue. What is going on?
I don't know.  But "spack load" is not reliable, since it depends on what else has happened to be installed.  I use Spack Environments to avoid these problems, and see "spack load" merely for casual occasional use.

Many useful software projects exist as a patch to a more established software project which has not been upstreamed yet. For example, Verrou is a CESTAC floating-point error checker which is packaged as a valgrind patch, and Templight is a C++ template bloat debugging tool which is packaged as a clang patch.
What is the expected way of packaging such projects? Should they be packaged as a variant of the upstream project, or as an no-op package which does nothing but depending on a patched version of the upstream project?

Download from a specific git commit out of the upstream repo, and invent a Spack-only version number for that commit.  When this "hack" is no longer needed, remove your unofficial "release" from Spack and replace with the official release.

-- Elizabeth

Hadrien Grasland

unread,
Jul 17, 2018, 5:19:20 AM7/17/18
to sp...@googlegroups.com

Hi Elisabeth,

Many thanks for your explanations! I will expand on specific points below.


  1. The ability of `spack create` to automagically fill the package.py seems limited in the common case of a CMake-based package downloaded via git. Can it be helped?
@adamjsteward has put a lot of work into automagic writing of package.py files.  But it is limited because in many cases, the required information is only written in English; and we don't have built AI bots yet that can read lengthy description pages of "Installation Instructions" and convert them to Python.

I am certainly well aware of this problem, but I do not think it is the heart of the issue here.

Most recent C++ packages can be installed using a variant of the following procedure:

  • git clone --branch=<release tag> --depth=1 <package repo URL>
  • cd <package dir> && mkdir build && cd build
  • cmake <config flags> ..
  • make
  • sudo make install

Using git repos like this is more convenient than using tarballs in several ways. The first one, which is directly relevant to Spack, is that the "git tag command" is all you need in order to enumerate available project releases. There is no need to investigate a project-specific procedure for enumerating release tarballs (some HTTP mirrors let you easily do that by truncating the URL, others... not so easily).

Another advantage of using git for downloading software releases is that there is a lower barrier to contribution. If you find something broken in a software release that you downloaded via git, all you need to do to cd into the repo, create a development branch, and start committing some changes. Once the patch is mature, you fork the official project repo, push your changes, and submit the patch as a MR. This is to be contrasted with tarballs, whose "read-only" nature creates a barrier to contribution as there is more work to do before one is able to contribute a bugfix (find the project website, then the git repo, clone in a different directory, find the tag equivalent to the tarball...).

For these reasons, and others which are not relevant to the Spack workflow (no need to fiddle with tar's infamous CLI flags, easy to update when a new release comes out), I have generally stopped using tarballs as a mechanism for downloading software source code, except in cases where I am forced into it by the upstream project.

Coming from this perspective, I was expecting "spack create <package repo URL>" to do what I would do myself when packaging such a project:

  • Clone the git repository in a temp directory
  • Enumerate release tags (those which are nothing but a sequence of numbers with an optional "v" in front of them)
  • Create a "version" entry for each of these tags, optionally using commit hashes instead of tag for better reproducibility.
  • Automagically mark the package as a CMakePackage

However, from the remainder of your reply, I understand that the reason why this does not happen is that tarballs are currently considered to be the preferred release distribution medium, and Spack therefore only has limited support for interacting with git repositories.


 

  1. Is there a way to avoid repeating the "git=" parameter over and over again in package.py when all versions of a package are available as commits in a git repository?

In short, no.

The standard way is to download tarballs, with a provided checksum.  The checksum ensures files haven't been tampered with since the author wrote the Spack package.  Checksums are not in effect when downloading from a symbolic git branch or tag; however, they ARE when downloading from a git hash.

GitHub and others provide a convenient way to download any Git branch, tag or commit as a tarball (which is checksummable).

These tarball-based download methods are dependent on the specific software forge in use, and often clunkier than git clone when it comes to enumerating available releases.

For example, if I take the ACTS project on CERN's Gitlab @ https://gitlab.cern.ch/acts/acts-core , I can indeed easily get a URL-based tarball of any branch or tag using URLs like https://gitlab.cern.ch/acts/acts-core/-/archive/master/acts-core-master.tar.gz . But trying to enumerate the available branches or tags using the intuitive URL https://gitlab.cern.ch/acts/acts-core/-/archive/ will just lead me through a 404 page, either because Gitlab does not support this or because a sysadmin somewhere disabled the feature.

Contrast with the git-based approach:

This works like a charm on any git-based project which follows the usual release tag naming conventions (which Spack already has built-in support for as it can enumerate tarballs in HTTP mirrors), and enables extra niceties such as being able to easily cd into the git repo and experiment with a patch using standard Git workflows.

In any case, I understand that for now, the answer is that Spack only provides first-class support for tarball-based downloads. I only hope that the above discussion will help you understand where I am coming from, and why I think that Git-based download could benefit from better integration and ergonomics. As said before, I am available to contribute improvements in this direction if someone feels like helping me get started.


 
  1. Sometimes, running "spack load" right after "spack install" fails, saying that a module does not exist. Re-loading setup-env.sh fixes the issue. What is going on?
I don't know.  But "spack load" is not reliable, since it depends on what else has happened to be installed.  I use Spack Environments to avoid these problems, and see "spack load" merely for casual occasional use.

Can you expand a bit on what you mean by "Spack Environments"? The spack documentation only directly refers to filesystem views and environment modules, are you referring to the former here?



Many useful software projects exist as a patch to a more established software project which has not been upstreamed yet. For example, Verrou is a CESTAC floating-point error checker which is packaged as a valgrind patch, and Templight is a C++ template bloat debugging tool which is packaged as a clang patch.
What is the expected way of packaging such projects? Should they be packaged as a variant of the upstream project, or as an no-op package which does nothing but depending on a patched version of the upstream project?

Download from a specific git commit out of the upstream repo, and invent a Spack-only version number for that commit.  When this "hack" is no longer needed, remove your unofficial "release" from Spack and replace with the official release.

I am not sure if I fully understood you here.

To fuel the discussion, here is my current experiment at integrating Verrou into spack by modifying the parent valgrind package : https://github.com/HadrienG2/spack/compare/e37554f...e4ff076 . Please do not mind the hacks around, this is not meant for merging yet.

Am I correct that the approach which you would suggest is to replace my current variant-based approach with another approach which adds a "verrou-2.0" release to valgrind, and otherwise generally does the same thing?

Cheers,
Hadrien

Elizabeth A. Fischer

unread,
Jul 17, 2018, 10:01:33 AM7/17/18
to Hadrien Grasland, Spack
Hadrien,

Yes, git is great.  But I think your assesment of git download support in Spack as "rudimentary" is off the mark.

In any case, security is a very real issue that must be dealt with.  Checksums aren't a magic guarantee; but they are FAR better than nothing.  They ensure the version YOU install is the same as the version seen by the original package author.  If that was a long time ago and nobody has complained about that version, then you have at least some confidence that it does not contain malicious code.  Without checksums, anyone could insert malicious code into any commonly used open source project and you'd never be the wiser.  Without the checksum capability, Spack would not be allowed to be used in many places it is used.

Most recent C++ packages can be installed using a variant of the following procedure:

  • git clone --branch=<release tag> --depth=1 <package repo URL>
  • cd <package dir> && mkdir build && cd build
  • cmake <config flags> ..
  • make
  • sudo make install

Except Spack does two very important things that your manual procedure above does not:

1. Check checksums
2. Ensure proper dependencies are installed BEFORE installing this package.

There is no standardized machine-readable way in CMake to specify dependencies.  Most of the work writing a Spack package is in getting the dependencies and variants right, not in downloading the source code.
 

For these reasons... I have generally stopped using tarballs as a mechanism for downloading software source code,

So have I.  See here:

This project uses (checksummable) tarball download for released versions.  For users wishing to install the latest development version --- and willing to accept the risks --- it uses a (non-checksummable) git download.  This allows me to update stuff without having to change a checksum (or git hash) in my Package every time I update.  One should only use tag or branch-based downloads for packages they trust (i.e. packages they control).
 

These tarball-based download methods are dependent on the specific software forge in use, and often clunkier than git clone when it comes to enumerating available releases.

Use git to enumerate available releases, then generate tarball-based donwload URLs.

except in cases where I am forced into it by the upstream project.

Coming from this perspective, I was expecting "spack create <package repo URL>" to do what I would do myself when packaging such a project:

  • Clone the git repository in a temp directory
  • Enumerate release tags (those which are nothing but a sequence of numbers with an optional "v" in front of them)
  • Create a "version" entry for each of these tags, optionally using commit hashes instead of tag for better reproducibility.
  • Automagically mark the package as a CMakePackage
You can add this automagic to `spack create` if you like.  Two recommendations:

1. Run any designs past @adamjstewart before implementing them.
2. If you want to use git download method, generate git downloads that download by hash, not tag, This is equivalent to checksums, ensuring against malicious alteration.  In other words, you should generate something like this, based on inspection of the git repo.  I promise to object to any system that encourages circumvention of checksums through the use of automagic.

version('1.2.1',
git='https://github.com/citibeth/icebin.git',
commit='fj2yh47r2ifisuhkuhf)

Or just generate the appopriate tarball download URL if you're downloading from GitHub, which is 90% of packages these days.

  1. Is there a way to avoid repeating the "git=" parameter over and over again in package.py when all versions of a package are available as commits in a git repository?

This is not a serious problem.  Especially if you're automagically generating your packages. 


Many useful software projects exist as a patch to a more established software project which has not been upstreamed yet. For example, Verrou is a CESTAC floating-point error checker which is packaged as a valgrind patch, and Templight is a C++ template bloat debugging tool which is packaged as a clang patch.
What is the expected way of packaging such projects? Should they be packaged as a variant of the upstream project, or as an no-op package which does nothing but depending on a patched version of the upstream project?

Download from a specific git commit out of the upstream repo, and invent a Spack-only version number for that commit.  When this "hack" is no longer needed, remove your unofficial "release" from Spack and replace with the official release.

I am not sure if I fully understood you here. 

To fuel the discussion, here is my current experiment at integrating Verrou into spack by modifying the parent valgrind package : https://github.com/HadrienG2/spack/compare/e37554f...e4ff076 . Please do not mind the hacks around, this is not meant for merging yet.


OK, this is an exceptional circumstance.  99.9% of packages require only own download.  The general idea is 1 repo = 1 tarball = 1 download = 1 Spack package.  You are right to use the resource mechanism when more than one download is needed.  In that case, I would recommend that verrou is the main download tarball (since this is the verrou package), and that valgrid is the resource that gets downloaded in addition.  You will probably need to go to some extra work in the package on setting up the stage.

A MUCH better approach would be if verrou's authors could simply maintain verrou as a branch off of valgrind.  Git is made for this kind of thing, and it would make everybody's life SO MUCH EASIER --- verrou's life, users' life, Spack's life.  That's what I thought you meant by "Many useful software projects exist as a patch to a more established software project which has not been upstreamed yet." 

Am I correct that the approach which you would suggest is to replace my current variant-based approach with another approach which adds a "verrou-2.0" release to valgrind, and otherwise generally does the same thing?

No

-- Elizabeth

Gamblin, Todd

unread,
Jul 17, 2018, 8:38:52 PM7/17/18
to Elizabeth A. Fischer, Hadrien Grasland, Spack
Hadrien,

Coming from this perspective, I was expecting "spack create <package repo URL>" to do what I would do myself when packaging such a project:

  • Clone the git repository in a temp directory
  • Enumerate release tags (those which are nothing but a sequence of numbers with an optional "v" in front of them)
  • Create a "version" entry for each of these tags, optionally using commit hashes instead of tag for better reproducibility.
  • Automagically mark the package as a CMakePackage
I’ll just second Elizabeth’s suggestion that adding something like you suggest to `spack create` would be great.  As long as the versions generated have commit hashes, we should have a sufficient guarantee about the integrity of what is downloaded.  

Is there a way to avoid repeating the "git=" parameter over and over again in package.py when all versions of a package are available as commits in a git repository?

You *should* be able to just put the git = at the top level, as for url:

class MyPackage(Package):
git = <git URL>
version(‘1.2.3’, commit=‘a1b2c3')

Does that work for you?

-Todd


--
You received this message because you are subscribed to the Google Groups "Spack" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spack+un...@googlegroups.com.
To post to this group, send email to sp...@googlegroups.com.
Visit this group at https://groups.google.com/group/spack.
For more options, visit https://groups.google.com/d/optout.

Hadrien Grasland

unread,
Jul 18, 2018, 4:40:11 AM7/18/18
to elizabet...@columbia.edu, Spack

Elisabeth,


In any case, security is a very real issue that must be dealt with.  Checksums aren't a magic guarantee; but they are FAR better than nothing.  They ensure the version YOU install is the same as the version seen by the original package author.  If that was a long time ago and nobody has complained about that version, then you have at least some confidence that it does not contain malicious code.  Without checksums, anyone could insert malicious code into any commonly used open source project and you'd never be the wiser.  Without the checksum capability, Spack would not be allowed to be used in many places it is used.

An excellent point! Let us discuss a bit the security of tarballs and Git repos, then. Here is my understanding of what can go wrong when downloading a tarball:

  • You are not talking with the server/mirror you think (authentication failure).
    • Easy attack on HTTP, much harder with HTTPS (requires breaching the TLS protocol/implementation, the target server, or the CA infrastructure).
    • Failure mode: You can get a malicious package.
  • You are talking with the right server, but it purposely gives you a bad package.
    • Can happen if either the server or the package author goes evil.
    • Failure mode: You can get a malicious package.
  • Spack tries to compensate for the "get a malicious package" failure mode with checksums
    • Robustness depends on resistance of checksum algorithm against intentional collisions
    • None with CRC, little with MD5, high with up-to-date cryptographic hashes like SHA-3.

And here is my understanding of what can go wrong when downloading from Git via HTTPS:

  • Git over HTTP is basically non-existent (and should be blacklisted if it is a thing), so we always get "hardened" HTTPS
    • If HTTPS attacks fall outside of Spack's threat model, then we can assume we're talking with the right server.
  • Security against evil server or package author depends on the download method:
    • Commit hashes give us SHA-1 checksums "for free" (not the best hash these days, but finding a collision is still hard)
    • Force-pushing tags is highly discouraged, but not forbidden. We must check for it, but it is okay to be suspicious when it happens.
    • Branches are frequently updated by many people, so using them intrinsically requires extra trust.
Therefore, here is what I would understand to achieve equivalent security w.r.t. tarball checksums when downloading over git:
  • For commit hashes, as long as SHA-1 is considered secure enough, git does the check for us => Nothing to do, AFAIK Spack already does this
  • For branches, scary warnings and opt-in are the best options => AFAIK, Spack already does this
  • For tags, we could compute a checksums ourselves and check it => Only need to ignore .git to make the checksum reproducible AFAIK

Do you think I have missed anything important in this analysis?



Most recent C++ packages can be installed using a variant of the following procedure:

  • git clone --branch=<release tag> --depth=1 <package repo URL>
  • cd <package dir> && mkdir build && cd build
  • cmake <config flags> ..
  • make
  • sudo make install

Except Spack does two very important things that your manual procedure above does not:

1. Check checksums
2. Ensure proper dependencies are installed BEFORE installing this package.

There is no standardized machine-readable way in CMake to specify dependencies.  Most of the work writing a Spack package is in getting the dependencies and variants right, not in downloading the source code.

Thanks for clarifying! Now I understand what you meant in your first e-mail when discussing manual procedures.

I agree that things like dependency specifications cannot be automated. My goal here would be to reach ergonomics parity with source tarballs, where IIUC Spack gives you a pre-filled template where you get for free...

  • Available versions
  • Build system (package base class)

...and you only need to manually fill in...

  • Description
  • Dependencies
  • Non-default build configuration

Packaging is boring work which we want a lot of people to be doing, so the easier we can make it, the better. I think Spack is already pretty good in this area, I would just like to make it even better for my use cases.


except in cases where I am forced into it by the upstream project.

Coming from this perspective, I was expecting "spack create <package repo URL>" to do what I would do myself when packaging such a project:

  • Clone the git repository in a temp directory
  • Enumerate release tags (those which are nothing but a sequence of numbers with an optional "v" in front of them)
  • Create a "version" entry for each of these tags, optionally using commit hashes instead of tag for better reproducibility.
  • Automagically mark the package as a CMakePackage
You can add this automagic to `spack create` if you like.  Two recommendations:

1. Run any designs past @adamjstewart before implementing them.
2. If you want to use git download method, generate git downloads that download by hash, not tag, This is equivalent to checksums, ensuring against malicious alteration.  In other words, you should generate something like this, based on inspection of the git repo.  I promise to object to any system that encourages circumvention of checksums through the use of automagic.

version('1.2.1',

git='https://github.com/citibeth/icebin.git',

commit='fj2yh47r2ifisuhkuhf)

Or just generate the appopriate tarball download URL if you're downloading from GitHub, which is 90% of packages these days.

Thanks for the pointer. Once I have built up a solid idea of what I want, I'll make a ticket about it on spack/spack and ping @adamjstewart. Want me to ping you as well?

As long as SHA-1 is trusted, I agree that commit hashes are the best option, are they save us from duplicating the checksumming which git already does internally. For a fallback path once security researchers are done breaking poor old SHA-1 (I don't expect Git to switch away from it then because it would break a lot of repos and the Git team has repeatedly stated that the use of SHA hashes in Git is not meant as a security mechanism), see my comment above regarding git tags.



Many useful software projects exist as a patch to a more established software project which has not been upstreamed yet. For example, Verrou is a CESTAC floating-point error checker which is packaged as a valgrind patch, and Templight is a C++ template bloat debugging tool which is packaged as a clang patch.
What is the expected way of packaging such projects? Should they be packaged as a variant of the upstream project, or as an no-op package which does nothing but depending on a patched version of the upstream project?

Download from a specific git commit out of the upstream repo, and invent a Spack-only version number for that commit.  When this "hack" is no longer needed, remove your unofficial "release" from Spack and replace with the official release.

I am not sure if I fully understood you here. 

To fuel the discussion, here is my current experiment at integrating Verrou into spack by modifying the parent valgrind package : https://github.com/HadrienG2/spack/compare/e37554f...e4ff076 . Please do not mind the hacks around, this is not meant for merging yet.

OK, this is an exceptional circumstance.  99.9% of packages require only own download.  The general idea is 1 repo = 1 tarball = 1 download = 1 Spack package.  You are right to use the resource mechanism when more than one download is needed.  In that case, I would recommend that verrou is the main download tarball (since this is the verrou package), and that valgrid is the resource that gets downloaded in addition.  You will probably need to go to some extra work in the package on setting up the stage.

A MUCH better approach would be if verrou's authors could simply maintain verrou as a branch off of valgrind.  Git is made for this kind of thing, and it would make everybody's life SO MUCH EASIER --- verrou's life, users' life, Spack's life.  That's what I thought you meant by "Many useful software projects exist as a patch to a more established software project which has not been upstreamed yet."

Thanks for this clarification and advice. I fully agree with you that this patch-based workflow is brittle and unpleasant, and that I would like to use it as infrequently as possible. I also agree that your proposed workflow based on branching off the upstream git repo is highly preferable when applicable.

I think the reason why verrou and templight did not use this method is that valgrind and clang historically used SVN for source control management. Now that valgrind has switched to git, I can try to convince the verrou devs to switch to a branching workflow. For clang, everything is still based on SVN, with autogenerated git mirrors, so this may be a tougher sell (I'm not sure how safe building work on top of an automatically generated git repo is).

For both projects, since the end result is a patched build of clang or valgrind, I suspect that I will need to mark the verrou package as conflicting with valgrind, and the templight package as conflicting with clang.

Cheers,
Hadrien

Hadrien Grasland

unread,
Jul 18, 2018, 5:07:34 AM7/18/18
to sp...@googlegroups.com

Todd,


Is there a way to avoid repeating the "git=" parameter over and over again in package.py when all versions of a package are available as commits in a git repository?
You *should* be able to just put the git = at the top level, as for url:

class MyPackage(Package):
git = <git URL>
version(‘1.2.3’, commit=‘a1b2c3')

Does that work for you?

I'm afraid not. With the following specification...

class ActsCore(CMakePackage):
    # ...description...

    homepage = "http://acts.web.cern.ch/ACTS/"
    url      = "https://gitlab.cern.ch/acts/acts-core.git"
    git      = "https://gitlab.cern.ch/acts/acts-core.git"

    version('v0.06.00', commit='7358bc8b274c5c474c98c9e44dd796666de11a9d')

    # ...dependencies, configuration, etc...

...I get the following error message when running "spack install acts-core":

==> Error: Unable to parse extension from https://gitlab.cern.ch/acts/acts-core.git.

If this URL is for a tarball but does not include the file extension
in the name, you can explicitly declare it with the following syntax:

    version('1.2.3', 'hash', extension='tar.gz')

If this URL is for a download like a .jar or .whl that does not need
to be expanded, or an uncompressed installation script, you can tell
Spack not to expand it with the following syntax:

    version('1.2.3', 'hash', expand=False)

The same specification works if git= is specified in each version() call instead of globally:

class ActsCore(CMakePackage):
    # ...description...

    homepage = "http://acts.web.cern.ch/ACTS/"
    url      = "https://gitlab.cern.ch/acts/acts-core.git"

    version('v0.06.00',
            git="https://gitlab.cern.ch/acts/acts-core.git",
            commit='7358bc8b274c5c474c98c9e44dd796666de11a9d')

    # ...dependencies, configuration, etc...

I was a bit suspicious about Spack's attempt to parse the URL, so I wondered if the top-level "url=" and "git=" weren't mutually exclusive and if I shouldn't remove the top-level "url=" when "git=" is used. But this is not the case: commenting out the "url=" results in an unhappy backtrace asking me to bring it back:

==> Error: Class constructor failed for package 'acts-core'.

Caused by:
NoURLError: Package ActsCore has no version with a URL.
  File "/home/hadrien/Software/spack/lib/spack/spack/repo.py", line 833, in get
    return package_class(spec)
  File "/home/hadrien/Software/spack/lib/spack/spack/package.py", line 672, in __init__
    f = fs.for_package_version(self, self.version)
  File "/home/hadrien/Software/spack/lib/spack/spack/fetch_strategy.py", line 996, in for_package_version
    attrs['url'] = pkg.url_for_version(version)
  File "/home/hadrien/Software/spack/lib/spack/spack/package.py", line 784, in url_for_version
    raise NoURLError(cls)

Cheers,
Hadrien

Elizabeth A. Fischer

unread,
Jul 18, 2018, 9:48:08 AM7/18/18
to Hadrien Grasland, Spack
 
class MyPackage(Package):
git = <git URL>
version(‘1.2.3’, commit=‘a1b2c3')

Does that work for you?

I'm afraid not. With the following specification...

class ActsCore(CMakePackage):
    # ...description...

    homepage = "http://acts.web.cern.ch/ACTS/"
    url      = "https://gitlab.cern.ch/acts/acts-core.git"
    git      = "https://gitlab.cern.ch/acts/acts-core.git"

    version('v0.06.00', commit='7358bc8b274c5c474c98c9e44dd796666de11a9d')

    # ...dependencies, configuration, etc...


This would be a nice feature to add in a PR!

Gamblin, Todd

unread,
Jul 18, 2018, 10:29:54 AM7/18/18
to Hadrien Grasland, sp...@googlegroups.com
What if you remove `url`?




--

Gamblin, Todd

unread,
Jul 19, 2018, 12:06:29 AM7/19/18
to Gamblin, Todd, Hadrien Grasland, sp...@googlegroups.com
Hadrien:

Can you see whether https://github.com/spack/spack/pull/3161 fixes this for you?

-Todd

Hadrien Grasland

unread,
Jul 19, 2018, 7:28:09 AM7/19/18
to elizabet...@columbia.edu, Spack

(Adding back forgotten mailing list CC)

Elizabeth,


You are right... security in Spack nods in the right direction, but it is half-baked.  Shortcomings include:

1. Widespread use of no-longer-secure checksum algos (although better ones are actually preferred; just not used much so far).
2. Lack of default "secure mode" that refuses to install things without a checksum.  Or without a secure-enough checksum.  Spack might warn, but that is not the same as a hard check.
3. Reliance on Git for security, even though docs say not to.  What if a rogue Git server is running somewhere?
4. New package PRs being submitted without any checksummable version at all.

Would you be interested in spearheading an effort to move Spack in the right direction on these issues?

I'm not sure if I would be the best person for this, for two broad reasons:

  • I'm not that proficient on security matters. It is a topic which I study occasionally out of curiosity, not one which I practice daily at work, and I'm likely to make mistakes and  approximations in this area which does not tolerate them. Two examples from recent conversations:
    • In my last e-mail, where I attempted to model a bit the threats of HTTPS and Git downloads, I implicitly made the assumption that the user's computer and software were safe. In a security context, these assumptions should be documented.
    • As Todd helpfully pointed out on my ticket concerning "spack checksum"'s current behaviour ( https://github.com/spack/spack/issues/8741 ), I'm confusing collision resistance vs second preimage resistance. Approximate terminology is dangerous in cryptography.
  • I'm worried that it could blow the limited time budget which I currently have available for playing with Spack at work, with respect to other areas where I think I could make progress more quickly like packaging my usual software or improving the user experience of creating packages from Git repos.


Packaging is boring work which we want a lot of people to be doing, so the easier we can make it, the better. I think Spack is already pretty good in this area, I would just like to make it even better for my use cases.

Here is where we disagree.  And this might have to do with roles: is Spack to be used by users to build their own software?  Or by sysadmins to more easily install software in response to user requests?  I am the former, but I believe that most of Spack is used by the latter.

In my expeirence, I only write Spack packages for stuff I need.  And what I need doesn't change very much over time.  So once I've gotten the packages written, I don't need to write much more.  Because this is an infrequent job, it is not worthwhile for me to learn fancy automagic tools to make the job easier.  My approach is generally to copy an existing package and fill-in-the-blanks.  In general, I write Spack packages less frequently than I file taxes, but more frequently than I renew my drivers license.

@adamjstewart disagrees with me on this issue.  He has put a lot of work into nifty automagic tools that I am too busy to learn how to use.  If I were a sysadmin routinely writing packages and installing stuff for others, I would probably find them useful.

Maybe a bit of context could help clarify what I am trying to do here.

I'm working in the computing department of a high energy physics (HEP) lab. As you may have heard, the HEP community has historically used various in-house software packaging solutions (CMT, LCGCMake...), but is currently investigating moving to more "standardized" software packaging solutions. Various solutions are investigated in this respect, including for example Nix, Portage and Spack. An incomplete description of this effort is available @ https://hepsoftwarefoundation.org/activities/packaging.html .

After studying the various alternatives which I know about, I am currently convinced that Spack is the most interesting solution that was proposed so far:

  • It seems to be built for portability, without favoring a particular Linux distribution (unlike Brew/Nix/Portage, or existing HEP solutions which have a strong RHEL bias).
  • It does not seem too tightly tied to a specific software community, unlike conda which has a strong bias towards python and the Continuum Analytics company.
  • It can easily install multiple software versions concurrently, which is invaluable for migrations and "reproducible environments" like containers.
  • It can easily manage multiple build configurations of a single software version, which is invaluable for compiler portability studies or benchmarking work.
  • It is not too heavily opinionated towards isolation, so leveraging system packages (like GPU drivers, MPI impls, Linux perf...) is easy.
  • It has a large existing package base and maintainer force, which reduces the porting effort that the limited HEP software manpower needs to do on its own.
  • It does not need to access network services after installation (unlike, say, LCGCMake).

Now, these are my opinions, and there are of course many divergent ones around:

  • The LCGCMake lobby is powerful, and will probably try to downplay the usefulness and highlight the weaknesses of these new solutions
  • The competition with proponents of other package managers (especially the macOS community, which would strongly appreciate Brew becoming the HEP standard) is likely to be fierce as well.

As a junior member of the HEP community, I cannot do much about the political part, but I can help Spack become a more interesting technical choice for us, by...

  • Packaging "tricky" software which I have personal experience building, providing convincing examples of how Spack can be useful to the HEP community.
  • Making it easy for others to join the Spack packaging experiment (we have hundreds of HEP packages to convert, that's a lot of work which should be distributed).


Spack Environements... see

I'm not sure if I fully understood how this is supposed to work. Could you provide me a few sample CLI commands in order to see how the daily interaction would look like ? I am particularly interested in what using a package through a spack environment would look like.

Cheers,
Hadrien

Hadrien Grasland

unread,
Jul 19, 2018, 8:17:57 AM7/19/18
to sp...@googlegroups.com

Todd,

This pull request works if I have a "git=", but not a "url=", at the top-level scope:

    # url      = "https://gitlab.cern.ch/acts/acts-core.git"
    git      = "https://gitlab.cern.ch/acts/acts-core.git"

    version('develop',
            branch='master')
    version('v0.06.00',
            commit='7358bc8b274c5c474c98c9e44dd796666de11a9d')

If both a URL and a git repository are specified (by uncommenting the "url=" above), then Spack tries to use the URL method for the git commits, which does not end well:

==> Error: Unable to parse extension from https://gitlab.cern.ch/acts/acts-core.git.

If this URL is for a tarball but does not include the file extension
in the name, you can explicitly declare it with the following syntax:

    version('1.2.3', 'hash', extension='tar.gz')

If this URL is for a download like a .jar or .whl that does not need
to be expanded, or an uncompressed installation script, you can tell
Spack not to expand it with the following syntax:

    version('1.2.3', 'hash', expand=False)

I think either of the following logics would be better for ergonomics:

  • Accept both "url=" and "git=" at top-level scope, use Git method when commit/tag/branch are specified in the version() call and URL method otherwise.
  • Do not accept both "url=" and "git=" at top-level scope, error out if they are specified at the same time.

Cheers,
Hadrien

Elizabeth A. Fischer

unread,
Jul 19, 2018, 9:20:54 AM7/19/18
to Hadrien Grasland, Spack
Hadrien, 

Would you be interested in spearheading an effort to move Spack in the right direction on these issues?

I'm not sure if I would be the best person for this, for two broad reasons:

  •  
  • I'm worried that it could blow the limited time budget which I currently have available for playing with Spack at work, with respect to other areas where I think I could make progress more quickly like packaging my usual software or improving the user experience of creating packages from Git repos.

No problem, I agree this is probably not where you should be spending your time.
 


Spack Environements... see

I'm not sure if I fully understood how this is supposed to work. Could you provide me a few sample CLI commands in order to see how the daily interaction would look like ? I am particularly interested in what using a package through a spack environment would look like.


Basically... a Spack Environment is a list of specs that installed one-by-one, and then loaded as a whole.  This has several advantages over not using environments:

 1. When you build a Spack Environment, you also get a script that loads the modules built by that environment.  No need to use a dozen slow "spack load" commands.

 2. "spack load" is non-deterministic, and therefore fundamentally broken.  Consider the following:

$ spack install foo@1.1
$ spack load foo    # works
$ spack install foo@1.2
$ spack load foo     # fails

This becomes a problem on a large shared HEP system where many people are simultaneously using many different versions of the same package.  Spack Environments isolate you whatever might be going on in the system OUTSIDE of your environment.

  3. If Spack Environments are used exclusively, then they provide a way to garbage-collect installed packages that aren't used anymore.

  4. Spack Environments + Spack Setup provides a way to seamlessly use Spack not just to install your dependencies, but also to develop, build and debug your own (CMake-based) package.  The idea is that some packages in an environment (at your choosing) will not be installed.  Instead, a setup script that calls CMake is generated for that package.  It is then your responsibility to run the setup script (instead of your "cmake" command), then type make and make install.  Once you do "make install", now your package is fully installed and available for use.  Spack Setup saves you the work, essentially, of implementing a package.py for your package twice --- once for Spack and once for development purposes.  I rely heavily on it.  Spack Setup can also be extended to work with Autotools or other kinds of build systems, if that is needed.


Here you can see the Spack Environment I rely on, it builds and loads up about 100 packages.  The packages modele-tests, modele, ibmisc, pism and icebin are installed into that environment in "setup" mode because I am actively working on their source code.

Once the environment is generated, I load it with this file:

That is basically a bit of customization around a Spack-generated loads file, which consists of 100 module load commands.

It would also be possible to "render" environments as:
  1. A symlink tree
  2. A single module file that loads the entire environment, rather than loading 100 individual package modules.

-- Elizabeth




Gamblin, Todd

unread,
Jul 19, 2018, 9:02:24 PM7/19/18
to Hadrien Grasland, sp...@googlegroups.com
Hadrien:

I think either of the following logics would be better for ergonomics:

  • Accept both "url=" and "git=" at top-level scope, use Git method when commit/tag/branch are specified in the version() call and URL method otherwise.
  • Do not accept both "url=" and "git=" at top-level scope, error out if they are specified at the same time.
I added a commit here to implement #2:

I like 2 better because I think that #1 would end up confusing people.  I’m also not convinced that we’d never get ambiguous cases (as fetchers evolve).  With this you get one “default” strategy at the top level and the rest need to be explicit.  That is easier to keep in my head, for me at least.

-Todd

Adam Stewart

unread,
Jul 21, 2018, 6:59:20 PM7/21/18
to Spack
Hey everyone,

I don't often check the Google Group, but I figured I would chime in on a small subset of this already long thread. I probably won't remember to read any replies, so ping me on GitHub or Slack if you actually want my attention. Honestly, I keep forgetting we still have a Google Group. Do we still need this thing?

As far as Spack's automatic creation of new packages is concerned, there are a couple of things I would like to point out. As you mentioned, `spack create` currently only accepts URLs, it can't take a git/hg/svn repo. If you have the time, I would wholeheartedly support any efforts to make this work! I really like the idea that Spack might someday support all of the same features for all download methods. For example, `spack versions` and `spack checksum` also only work for URLs. As you've noted, there are many different fetching strategies that don't play by these rules. Another good example is https://github.com/spack/spack/pull/2718, where I tried to add a new fetch strategy to properly download Python packages from PyPI. There happens to be a magic URL that allows you to download any version of a Python package, but it isn't listable, so `spack versions` and `spack checksum` don't work. If you want to add a new version, you have to manually download it and checksum it yourself before copying and pasting that all into the package.py. I would love to see a future where I can `spack versions`/`spack checksum`/`spack create` a URL, Git repo, and PyPI repo. I think https://github.com/spack/spack/pull/3161 is a step in the right direction for making git a first class citizen, but there is much work left to do. If I wasn't a grad student, I think this task would probably be my highest priority, but it involves messing with dozens of different sections of Spack that I am less familiar with, and alas I have no free time :(

In the meantime, if you need to create a new package but don't have a URL, you can still fill in most of the blanks like so:

$ spack create --name foo --template cmake

P.S. You may also be interested in https://github.com/spack/spack/pull/3626. I haven't looked into it in too much detail, but it purports to autogenerate many of the variants and dependencies of CMake packages.

Adam

Gamblin, Todd

unread,
Jul 21, 2018, 7:53:28 PM7/21/18
to Adam Stewart, Spack
Honestly, I keep forgetting we still have a Google Group. Do we still need this thing?

I think it’s helpful for folks who want to just use email, and honestly I think a lot of thing can get lost in GitHub issues, so it’s nice to have a central place for announcements and organizing things like telcons.

There are upwards of 260 people on the google group so I don’t think just removing it is a good idea :)

-Todd


Adam Stewart

unread,
Jul 22, 2018, 8:42:23 PM7/22/18
to Spack
Okay, one other comment I wanted to make on the secure download methods discussion.

Before I say anything, I want to be up front and say that I know next to nothing about security! With that said, I personally don't feel particularly confident in the ability of checksums to protect us from bad source code. When I add a new version of a package, I ask Spack to calculate a checksum for me and assume it is correct. I have no other means to verify that the checksum I'm adding is safe. Also, package developers keep re-releasing versions on me, so if I want to fix the broken checksum, I have to verify over email with some stranger that they aren't a hacker and that this change is legit. Basically, checksums are better than nothing, but they aren't great? I don't feel like checksums provide us with significantly more security than Git does, but like I said, I don't know anything about security.

I really like Git, and I frequently encounter software that is either very new or has very few users, so they have 0 stable releases. They generally suggest installing from the latest master or develop branch. Personally, I would like to respect their wishes and avoid my users using ancient versions of software that likely contain bugs or are missing many features (it is pre-release software, after all). So if we can get at least some level of security checking involved, that would be great. 

I don't really like pushing package writers away from using Git and other VCS. If there is a stable release tarball, I'll ask people to add it as the default version installed. But if there isn't, I don't really like making them create their own "fake" release.

P.S. Can someone comment on https://github.com/spack/spack/issues/6785? There has to be somewhat secure methods of downloading from Mercurial and Subversion like there are with Git.

P.P.S. Hey, when are we going to switch every package from MD5 to SHA256?? We should do that before SC. We keep getting new users who are security conscious and who call us out on our use of a deprecated security method.

Adam

Hadrien Grasland

unread,
Jul 23, 2018, 3:04:55 AM7/23/18
to sp...@googlegroups.com

Hi Adam,


Okay, one other comment I wanted to make on the secure download methods discussion.

Before I say anything, I want to be up front and say that I know next to nothing about security! With that said, I personally don't feel particularly confident in the ability of checksums to protect us from bad source code. When I add a new version of a package, I ask Spack to calculate a checksum for me and assume it is correct. I have no other means to verify that the checksum I'm adding is safe. Also, package developers keep re-releasing versions on me, so if I want to fix the broken checksum, I have to verify over email with some stranger that they aren't a hacker and that this change is legit. Basically, checksums are better than nothing, but they aren't great? I don't feel like checksums provide us with significantly more security than Git does, but like I said, I don't know anything about security.

Just to clarify this point, a Spack package typically involves at least three persons/roles:

  • The end user (who installs the package)
  • The author (who maintains the software)
  • The packager (who maintains the package.py file)

As long as they are cryptographically secure and not allowed to be overwritten by PRs, checksums provide the following useful guarantees:

  • All end users get the same source code when they ask for a given software revision.
  • End users get the same source code as the person who wrote the package.py did.

The benefit of the first property to a package manager should be obvious. It means that we get reproducible bug reports, for example. It also provides a partial guarantee against software mirror compromises or software authors going rogue, in the sense that if a rogue package is downloaded in place of a known-legitimate software version, Spack is able to detect and report it to end users.

The second property is more subtle. Given the (as of now unchecked) constraint that the package.py maintainers are independent from the software authors, it allows introducing (as of now nonexistent) procedures for reviewing the quality or safety of software versions before allowing them to be merged into the Spack repo. Basically, it allows requesting package security and QA reviews before making them available to end users. We do not do this today, and it would probably be too expensive to introduce that with the manpower that is currently available, but it's nice to be able to introduce such extra security measures in the future if problems ever arise.

In principle, git and hg commit hashes can provide similar guarantees to tarball checksums, because they are a function of the commit's contents and ancestors. AFAIK, a bit of care is required in practice because version control systems do not necessarily fully check commit integrity during normal downloads (something like git fsck or hg verify is needed). Also, another problem with the use of crypto hashing algorithms in version control systems is that it is very difficult for a VCS to switch to a new hashing algorithms when the underlying crypto is getting weak (as in the SHA-1 case) because doing so would invalidate every commit identifier in the distributed world.

To conclude, no checksumming will ever prove that software is safe. Due to the halting problem, and the difficulty of defining "security" in code, devising a fully automated security check is likely to be provably impossible. What checksumming (or explicit commit hashes) can do is to allow confidence in a software version to grow over time as more eyes end up looking at the package. This is not foolproof, e.g. we could imagine malware which waits for a signal from a remote server before going rogue, but it is a nice basic guarantee to have.

Cheers,
Hadrien

Gamblin, Todd

unread,
Jul 23, 2018, 3:28:17 AM7/23/18
to Hadrien Grasland, sp...@googlegroups.com

The second property is more subtle. Given the (as of now unchecked) constraint that the package.py maintainers are independent from the software authors, it allows introducing (as of now nonexistent) procedures for reviewing the quality or safety of software versions before allowing them to be merged into the Spack repo. Basically, it allows requesting package security and QA reviews before making them available to end users. We do not do this today, and it would probably be too expensive to introduce that with the manpower that is currently available, but it's nice to be able to introduce such extra security measures in the future if problems ever arise.

FWIW, this is something we’re considering as we ramp up the infrastructure for hosting binaries.

There are services that, given a tarball (or just a sha256sum), will check for vulnerabilities in the NVD and other places.  It would be possible to integrate this with Travis or another service to check package PRs, and add an approval step based on such a scan.  We could also do a nightly scan on the whole builtin repo to keep things current, and to automatically update packages (e.g. by deprecating versions) if any issues come up.

Currently we don’t do any of this, but I think it would be great to move in this direction.

-Todd


Hadrien Grasland

unread,
Jul 23, 2018, 5:38:45 AM7/23/18
to elizabet...@columbia.edu, Spack

Elizabeth,


Spack Environements... see

I'm not sure if I fully understood how this is supposed to work. Could you provide me a few sample CLI commands in order to see how the daily interaction would look like ? I am particularly interested in what using a package through a spack environment would look like.


Basically... a Spack Environment is a list of specs that installed one-by-one, and then loaded as a whole.  This has several advantages over not using environments:

 1. When you build a Spack Environment, you also get a script that loads the modules built by that environment.  No need to use a dozen slow "spack load" commands.

 2. "spack load" is non-deterministic, and therefore fundamentally broken.  Consider the following:

$ spack install foo@1.1
$ spack load foo    # works
$ spack install foo@1.2
$ spack load foo     # fails

This becomes a problem on a large shared HEP system where many people are simultaneously using many different versions of the same package.  Spack Environments isolate you whatever might be going on in the system OUTSIDE of your environment.

  3. If Spack Environments are used exclusively, then they provide a way to garbage-collect installed packages that aren't used anymore.

  4. Spack Environments + Spack Setup provides a way to seamlessly use Spack not just to install your dependencies, but also to develop, build and debug your own (CMake-based) package.  The idea is that some packages in an environment (at your choosing) will not be installed.  Instead, a setup script that calls CMake is generated for that package.  It is then your responsibility to run the setup script (instead of your "cmake" command), then type make and make install.  Once you do "make install", now your package is fully installed and available for use.  Spack Setup saves you the work, essentially, of implementing a package.py for your package twice --- once for Spack and once for development purposes.  I rely heavily on it.  Spack Setup can also be extended to work with Autotools or other kinds of build systems, if that is needed.


Here you can see the Spack Environment I rely on, it builds and loads up about 100 packages.  The packages modele-tests, modele, ibmisc, pism and icebin are installed into that environment in "setup" mode because I am actively working on their source code.

Once the environment is generated, I load it with this file:

That is basically a bit of customization around a Spack-generated loads file, which consists of 100 module load commands.

It would also be possible to "render" environments as:
  1. A symlink tree
  2. A single module file that loads the entire environment, rather than loading 100 individual package modules.

Thanks for the explanation. Looks very promising indeed! I hope that as the project matures, someone with in-depth knowledge of this mechanism will be able to give it equal treatment to modules and views in the Spack end user documentation :)

Hadrien

Reply all
Reply to author
Forward
0 new messages