State of hashdist development

57 views
Skip to first unread message

Matthew Krafczyk

unread,
Oct 26, 2016, 6:41:55 PM10/26/16
to hashdist
Hello, I am curious about the state of current development of hashdist.

The hashstack git repo seems pretty active, but the hashdist repo seems somewhat silent.

The reason I ask is that I'm interested in using hashdist as part of an initiative in reproducible research.

Ideally, I'd like researchers to be able to open a tarball with all necessary source code, a copy of
hashdist and yaml files describing how to build the correct environment. Given a working compiler,
hashdist should be able to build everything necessary to build and run a researcher's code. This can
also be done cleanly within the directory containing everything else from that tarball.

I've made some small modifications to hashdist to get part of the way there. I've made a
HASHDIST_ROOT_DIR environment variable so that hashdist can operate out of a specific directory instead
of requiring access to the home directory. This has worked pretty well, but there are still a few things
to be desired such as metapackages which could be used to specify a specific build environment such as
a specific bash executable, compiler, linker, etc..

Thoughts?

Chris Kees

unread,
Oct 26, 2016, 11:36:45 PM10/26/16
to hash...@googlegroups.com
Hello Matthew,

It's true things have been quiet. At the moment hashdist is cursed with doing most of what I need it to do while at the same time I'm swamped and can't afford to put much of my own time into it.  I can't really speak for the other developers, but I suspect that's the situation with most.  If I can find somebody with the right skills at my lab I will likely encourage some additional development over the next year, which might relate to your issues. We frequently want to reproduce our software stack in HPC environments where specifying a build environment is pretty clunky at present.  

It seems like what you want to do is certainly consistent with the goals of hashdist, so if you want to go ahead with the project I'll do my best to help you make the modifications to hashdist that you need.

Chris

--
You received this message because you are subscribed to the Google Groups "hashdist" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hashdist+unsubscribe@googlegroups.com.
To post to this group, send email to hash...@googlegroups.com.
Visit this group at https://groups.google.com/group/hashdist.
To view this discussion on the web visit https://groups.google.com/d/msgid/hashdist/06d40a0c-34c1-451b-bf4c-4d03120dd70c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matthew Krafczyk

unread,
Oct 28, 2016, 11:34:59 AM10/28/16
to hashdist
Hi Chris,

Thanks a lot for your reply.

This is all good to hear, however shortly after posting this, a colleague pointed me to the spack project which seems more mature for this task.
For the moment, I'm going to see whether spack can be molded to do what I want. I'm definitely going to keep a close eye on hashdist though.

Matthew

Chris Kees

unread,
Oct 28, 2016, 12:33:33 PM10/28/16
to hash...@googlegroups.com
Sounds good. Let us know how it goes. I'm aware of spack but haven't had time to experiment with it. -Chris

--
You received this message because you are subscribed to the Google Groups "hashdist" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hashdist+unsubscribe@googlegroups.com.
To post to this group, send email to hash...@googlegroups.com.
Visit this group at https://groups.google.com/group/hashdist.

Ondřej Čertík

unread,
Oct 29, 2016, 3:19:51 PM10/29/16
to hash...@googlegroups.com
Hi Matthew,

Let us know how you solved your problem with Spack. I would be
interested to do the same with Hashdist.

Ondrej
>> email to hashdist+u...@googlegroups.com.
>> To post to this group, send email to hash...@googlegroups.com.
>> Visit this group at https://groups.google.com/group/hashdist.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/hashdist/1189bfaf-029a-489e-a7cf-428823bd8a94%40googlegroups.com.
>>
>> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "hashdist" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to hashdist+u...@googlegroups.com.
> To post to this group, send email to hash...@googlegroups.com.
> Visit this group at https://groups.google.com/group/hashdist.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/hashdist/CAOVFbFhh70mCo6OMeNLqZba9MtB8ZRkZ8QBF1Q_Z4T5_f_ArbA%40mail.gmail.com.

Mario Schreiber

unread,
Dec 26, 2016, 10:26:37 AM12/26/16
to hashdist

Hi Ondrej,
I am also wondering about the short and long term plans for hashdist. One or two weeks ago, I asked for a very simple feature, to download all prerequisites of a package, you submitted an issue for this, and that's it. Hundreds of similar issues are lying around, and nobody cares about. Maybe you shouldn't try to implement everything that spack offers, but at least make the basic features usable.
When hashdist is mature at a very basic level, I am sure many people will help to improve one or the other feature. Right now, however, I have doubts whether the design is even capable of achieving the goals.
Just my 2 cents.

Mario

Ondřej Čertík

unread,
Dec 27, 2016, 12:14:24 AM12/27/16
to hash...@googlegroups.com
Hi Mario,

On Mon, Dec 26, 2016 at 8:26 AM, Mario Schreiber
<schreibe...@gmail.com> wrote:
>
> Hi Ondrej,
> I am also wondering about the short and long term plans for hashdist. One or
> two weeks ago, I asked for a very simple feature, to download all
> prerequisites of a package, you submitted an issue for this, and that's it.
> Hundreds of similar issues are lying around, and nobody cares about. Maybe
> you shouldn't try to implement everything that spack offers, but at least
> make the basic features usable.
> When hashdist is mature at a very basic level, I am sure many people will
> help to improve one or the other feature. Right now, however, I have doubts
> whether the design is even capable of achieving the goals.

Chris and I talked about this few weeks ago over phone. My plan is to
using hashdist to interface Conda. We could also interface Spack,
though I think Conda is more mature. And simply let Conda do the hard
work of managing and building packages. That way the only thing we
have to maintain with Hashdist is to prepare the sources for Conda (or
Spack).

Out of curiosity, why cannot you use Spack or Conda now? There is also
Conan (https://www.conan.io/). When we started Hashdist, none of these
were available. Now there seems to be quite a few options. Though I
still miss the profiles and configurability of Hashdist/Hashstack, and
so I think that's what Hashdist should do, and leave the binary stuff
and managing environments to Conda, which I think already does that
part better.

Regarding the design, I don't see it limiting in any way, it's just a
matter of having time to implement it.

Ondrej

Gamblin, Todd

unread,
Dec 27, 2016, 12:41:09 AM12/27/16
to hash...@googlegroups.com
Hi Ondrej,

In the plan below, I’m a little unclear on what the hashdist layer would continue to provide. Would hashdist just do profile management? i.e., history and build preferences? Also, who’s your target audience going forward? Are you targeting HPC or mostly end-user machines?

I’m under the impression that there are things Conda just can’t (or doesn’t care to) build, like Cray binaries. And the dependency model, AFAIK, doesn’t really handle combinatorial versions of things (multi-compiler, multi-mpi, etc.). How would hashdist handle that stuff with Conda? How would you name the binaries?

I’m curious because I’d like to have some better profile/environment features in Spack, and I think we have a lot of what you might need in the `packages.yaml` configuration we added this year:

https://spack.readthedocs.io/en/latest/build_settings.html#concretization-preferences

You can set your preferred MPI implementation, compiler, variants etc in a single file, per-package or for all of them, in multiple scopes (defaults, spack instance, user):

https://spack.readthedocs.io/en/latest/configuration.html#configuration-scopes

What we don’t have at the moment is something resembling a “per-project” config scope, which is more like what a hashdist profile would do. I think that would be pretty easy to add. Would you guys be interested in contributing some type of profile capability to Spack? I think a lot of the infrastructure you’d need is already there, and you’d get the added control over compilers, the dependency model, and the DAG hashing that Spack offers. Binary packaging is also coming soon, courtesy of CERN:

https://github.com/LLNL/spack/pull/445

-Todd
--
You received this message because you are subscribed to the Google Groups "hashdist" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hashdist+u...@googlegroups.com.
To post to this group, send email to hash...@googlegroups.com.
Visit this group at http://secure-web.cisco.com/1TLRlItf5fJ154sIpylepUpqQQLdlc_I2BBSvQWSlt_lsvT9jM3eFUusp3DVWA3SbLV5KuQ6Wo5v3Lt0FsgWKy18G4HjLC4OdPeMzIVTrHsJ0hnAwJiRnx9Mx4O9cIOVmDR0n9G7PGRm8TT4xZKXX5_VPLbHrd2FilMOBD-RHY9-TpebYXrUNhUK-i62Q2ZxgANsvuA8XnbUnkFiLRY4KRCmbJ6zr9LaAO5eq9dERASvk6U7g_jJ3CP2TqeqUtrFfgKkKyeA_MMZXlmWoEr5d0IZYTcIaz9X3CrV4No2cFw4LIoEnIR6wckWxiG_gQYQo0kmpR2YCMIZmBUzFxJ1NAdqJfPZ6ppfLxJBcArz6ST0eATXlpDKreQJ97gd9JzClFUAncE7dg6d8L-Fm7SifWe-mch6gCMqY2pSfjhYkjAcjun1aHAfGScUAsSaxGsup_8MF36MGD6R672RpRpUVlgvQtCRPuK44DF5_ZBkbEa5V0GBcON50A6t77gapu_jP/l40%3Ahttps%3A%2F%2Fgroups.google.com%2Fgroup%2Fhashdiste.
To view this discussion on the web visit https://groups.google.com/d/msgid/hashdist/CADDwiVCdwawVzJhjR2gtWNSnY-vayC6B_iq1mRajeKxUqEKSNw%40mail.gmail.com.
For more options, visit http://secure-web.cisco.com/1Zwp2TXc2hJWC-v6B8MLYg-v0LgrrnDb9sgYNTz5lxZB3ZhcRgsO7wKUOwfxhxaTTkdbZPcfmusGYllkib2hgGj1kc6l33tv8Typr1vA8_l4HdZds_CG69PYrNBSDJsEw33DcshHtHeaLKPnbD4IAaocXqEyNSw2sK01VS_ptx0WN0J7-Qbl2TcsfewbSGaRSv7ojhNzJH5U0fVctlQ62b3p6_enO82SWZRvSDWi2PxYbxJLbt2731fIhArEzSDTmtJctcJ9F1spXOvFll0AoK0j1BrI6ctBv4qOqVae1EXNkJSfAyMAfIzwMP_NbUVhE3t0DjBZiLBWDihM9lAKGvd1QXm4PHAZpOZ9W3onKXO0YMFygyXk_OS0XH_DjDeo29gdC_MawKSDg6H17uL7dUfpOvQd7IMA9N9f2bEdnxKW3_1IPxvTjwS0H9aujrjRrrIMFMOJTzFZ6N82exHnZ1sFj9EMddzMtjZwy507NNrmcL0nJxoo0BdYmGU_7hEP_/l34%3Ahttps%3A%2F%2Fgroups.google.com%2Fd%2Foptoute.


Ondřej Čertík

unread,
Dec 28, 2016, 1:46:33 PM12/28/16
to hash...@googlegroups.com
Hi Todd,

On Mon, Dec 26, 2016 at 10:41 PM, Gamblin, Todd <gamb...@llnl.gov> wrote:
> Hi Ondrej,

The way I am envisioning it is that we use Hashdist just like now, so
you have a profile and a list of packages (Hashstack). Just like
Spack, Hashdist will decide exactly how each package will be built.
Currently Hashdist itself does the building and then enabling/loading
the "environment/profile", so that you can use those installed
packages. I haven't followed the Spack development too closely, so I
can't speak for Spack, but Conda handles the binary packages very
nicely and has a large community around it. Hashdist puts the package
hash (calculated from sources + build script + dependencies, etc.) as
a version into the package.

I should stress that I got this idea from Aron Ahmadia, and I think
Chris doesn't mind this direction either.

Now I can answer your questions:

>
> In the plan below, I’m a little unclear on what the hashdist layer would continue to provide. Would hashdist just do profile management? i.e., history and build preferences?

Not just that, also the way you can easily configure each package,
say, you want to enable or disable some compile time option of a given
package, and have two environments/profiles that allow you to easily
switch between both. Essentially Hashdist would do everything that it
does now, but would partner with Conda to handle things that Conda
does well.

> Also, who’s your target audience going forward? Are you targeting HPC or mostly end-user machines?

Just like now, we are targeting both HPC and end-user machines.

>
> I’m under the impression that there are things Conda just can’t (or doesn’t care to) build, like Cray binaries.

Hashdist provides the sources of the package/stack that builds on
Cray. We only use Conda the package manager, but we provide our own
packages.

> And the dependency model, AFAIK, doesn’t really handle combinatorial versions of things (multi-compiler, multi-mpi, etc.). How would hashdist handle that stuff with Conda? How would you name the binaries?

Hashdist handles the combinatorial explosion of versions by using
hashes. Conda allows you to put such a hash into the version ---
technically, as a first iteration, I would set each package version as
1.0, and put the hash into a "tag" (I think they call it a "build
string"), which Conda uses as part of the version. So there is no
problem, Conda can already do everything that we need.

View Conda as a binary distribution, and Hashdist as a source distribution.

>
> I’m curious because I’d like to have some better profile/environment features in Spack, and I think we have a lot of what you might need in the `packages.yaml` configuration we added this year:
>
> https://spack.readthedocs.io/en/latest/build_settings.html#concretization-preferences
>
> You can set your preferred MPI implementation, compiler, variants etc in a single file, per-package or for all of them, in multiple scopes (defaults, spack instance, user):
>
> https://spack.readthedocs.io/en/latest/configuration.html#configuration-scopes
>
> What we don’t have at the moment is something resembling a “per-project” config scope, which is more like what a hashdist profile would do. I think that would be pretty easy to add. Would you guys be interested in contributing some type of profile capability to Spack? I think a lot of the infrastructure you’d need is already there, and you’d get the added control over compilers, the dependency model, and the DAG hashing that Spack offers. Binary packaging is also coming soon, courtesy of CERN:
>
> https://github.com/LLNL/spack/pull/445

We also have an open PR for binary packages:

https://github.com/hashdist/hashdist/pull/314

But while both your and our binary package PR are good to have, Conda
already does this --- in fact my understanding is that's precisely
what Conda does -- how to handle binary packages, mirrors, relocation
of binaries (rpath, cmake, ...), etc. So I much rather would like to
use Conda, which has years of experience and polishing bugs regarding
the binaries, than a new PR (in either Hashdist or Spack).

Rather than trying to reproduce everything that Conda does, and there
is a lot of it, I just want to use it. Spack will have to implement
all these binary packages that Conda can do, and that's a lot of work,
and I don't have time to do that currently.

Ondrej

P.S. Another issue with Spack is that it's LGPL licensed, if there is
a choice, I much rather have a BSD/MIT licensed tool, like Hashdist or
Conda.

Sumana Harihareswara

unread,
Feb 2, 2020, 8:14:01 AM2/2/20
to hashdist
Hi -- I was just updating some documentation about Python packaging and distribution tools and started updating the entry about Hashdist, and came across this discussion.

If people are interested in finding funding to improve Hashdist in some way, the Chan Zuckerberg Initiative's Essential Open Source for Science funding, or other (non-academic) grants for open source work, might be of interest. I'll be at the Exascale Computing Project Annual Meeting this week in case anyone wants to talk about that or have other packaging discussions related to Python (I've been working on PyPI for the last few years, and pip starting this year).

-Sumana Harihareswara
Changeset Consulting

On Wednesday, October 26, 2016 at 11:36:45 PM UTC-4, Chris Kees wrote:
Hello Matthew,

It's true things have been quiet. At the moment hashdist is cursed with doing most of what I need it to do while at the same time I'm swamped and can't afford to put much of my own time into it.  I can't really speak for the other developers, but I suspect that's the situation with most.  If I can find somebody with the right skills at my lab I will likely encourage some additional development over the next year, which might relate to your issues. We frequently want to reproduce our software stack in HPC environments where specifying a build environment is pretty clunky at present.  

It seems like what you want to do is certainly consistent with the goals of hashdist, so if you want to go ahead with the project I'll do my best to help you make the modifications to hashdist that you need.

Chris

Chris Kees

unread,
Feb 2, 2020, 11:24:32 PM2/2/20
to hash...@googlegroups.com
Hi Sumana,

I'm not going to be at the meeting, but I'd be interested in a
discussion. My projects are currently using some mix of hashdist, pip,
and conda along with simple use of the modules system on various HPC
environments. It's not very satisfactory as a whole, but I haven't had
much time to put into hashdist. We forked hashdist and hashstack into
the erdc github organization to include some support for source and
binary remote caches that I implemented, but those are all the changes
we've made in the last few years.

Chris
> --
> You received this message because you are subscribed to the Google Groups "hashdist" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to hashdist+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/hashdist/53a7cd40-abba-46b1-ab41-1a5b93292a6f%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages