Profile loading and management

50 views
Skip to first unread message

Ondřej Čertík

unread,
Dec 10, 2014, 3:12:31 PM12/10/14
to hash...@googlegroups.com, Dag Sverre Seljebotn
Hi,

Let's discuss how to simplify profile loading. Here is a list of my
profiles that I use:

certik@redhawk:~/repos/hashstack(w)$ ll
total 192
lrwxrwxrwx 1 certik certik 38 Nov 7 17:33 basis ->
/local/certik/bld/profile/k4g2r7wjqcth/
lrwxrwxrwx 1 certik certik 38 Dec 9 16:38 default ->
/local/certik/bld/profile/x2yerpwugoig/
lrwxrwxrwx 1 certik certik 38 Oct 22 13:28 gtk ->
/local/certik/bld/profile/mgz74vgzjtdp
lrwxrwxrwx 1 certik certik 38 Nov 7 17:32 hd_base ->
/local/certik/bld/profile/d2nhm3s6qzwn/
lrwxrwxrwx 1 certik certik 38 Dec 2 17:43 numpy ->
/local/certik/bld/profile/kkhizx2z6ea4/
lrwxrwxrwx 1 certik certik 38 Dec 3 17:14 py27 ->
/local/certik/bld/profile/gc25gkuulirz/
lrwxrwxrwx 1 certik certik 38 Dec 3 17:12 py32 ->
/local/certik/bld/profile/loglud5apymb/
lrwxrwxrwx 1 certik certik 38 Dec 3 16:52 py33 ->
/local/certik/bld/profile/opqanpz7bwzd/
lrwxrwxrwx 1 certik certik 38 Dec 10 12:13 py34 ->
/local/certik/bld/profile/2z6vwzjbxxkc/
lrwxrwxrwx 1 certik certik 38 Dec 4 13:12 python3.4.linux ->
/local/certik/bld/profile/yt6dksb4bzp3/
lrwxrwxrwx 1 certik certik 38 Nov 5 11:07 rhel6 ->
/local/certik/bld/profile/5vq4exlhbnuj
lrwxrwxrwx 1 certik certik 38 Sep 16 15:58 temp ->
/local/certik/bld/profile/ehcfjfeibvh5
lrwxrwxrwx 1 certik certik 38 Dec 9 13:20 truchas ->
/local/certik/bld/profile/vay33ptw4bl


I manually pruned the output of "ls" to only show the links to
profiles. I counted 13 of them.
A collaborator just sent me his Python 3 files (some Python modules
and IPython notebook). I want to try them out. So I go to /tmp/x
directory with the files, and:

certik@redhawk:/tmp/x$ export HASHSTACK=$HOME/repos/hashstack/py34
certik@redhawk:/tmp/x$ export PATH=$HASHSTACK/bin:$PATH

I can test that things work:

certik@redhawk:/tmp/x$ which python
/home/certik/repos/hashstack/py34/bin/python
certik@redhawk:/tmp/x$ which ipython
/home/certik/repos/hashstack/py34/bin/ipython

Now I can fire up the notebook and start playing with the code:

certik@redhawk:/tmp/x$ ipython notebook

or executing some scripts like this:

certik@redhawk:/tmp/x$ python eq.py

or run tests like:

certik@redhawk:/tmp/x$ nosetests

or compile my own package, for example csympy, using:

CC=gcc CXX=g++ cmake -DCOMMON_DIR=$HASHSTACK .

You can see that I can just use $HASHSTACK to point to my profile so
that the cmake can find all the packages it needs. (The CC and CXX
variables must be explicitly set to use Hashstack's gcc 4.9.2
compiler.)

But every time I open a new terminal, I have to execute these two lines by hand:

certik@redhawk:/tmp/x$ export HASHSTACK=$HOME/repos/hashstack/py34
certik@redhawk:/tmp/x$ export PATH=$HASHSTACK/bin:$PATH

And sometimes, when I compile other projects with our own gcc, I also
need to do:

export LD_LIBRARY_PATH=$HASHSTACK/lib

The reason is that gcc creates an executable and links to its own
libraries from $HASHSTACK/lib, but it doesn't set rpath by default.
Sometimes cmake sets rpath to $HASHSTACK/lib, for example for csympy,
so then things work. But if it's some other project, it usually
doesn't work, so one either has to set LD_LIBRARY_PATH, which is
typically what the "module" system does on clusters when you load
compilers like gcc or Intel, or set rpath by hand in the third party
project, which typically requires some non-trivial modifications to
its build system.

Anyway, so now you see the whole motivation. We need some simple way
to load the "py34" profile, something like this: I open a new
terminal, go to some directory and do:

$ hit load py34

which will automatically execute:

$ export HASHSTACK=$HOME/repos/hashstack/py34
$ export PATH=$HASHSTACK/bin:$PATH

or I can do:

$ hit load -l py34

which will execute:

$ export HASHSTACK=$HOME/repos/hashstack/py34
$ export PATH=$HASHSTACK/bin:$PATH
$ export LD_LIBRARY_PATH=$HASHSTACK/lib

I think that "-l" should not be the default, as the default should be
to only add the profile's bin into your PATH and set the $HASHSTACK
variable, so that you can use it when compiling things by hand.


So how can this be implemented? Well, the "hit" tool knows about all
the profiles already, it's specified in:

$ cat ~/.hashdist/config.yaml
...
build_stores:
- dir: /local/certik/bld
...

and on my machine:

$ ls /local/certik/bld/profile/
2c7tgbzl6zss d2nhm3s6qzwn iseg62m2zp52 qs5rackjme57 x2yerpwugoig
2z6vwzjbxxkc ds27zngrz34y k4g2r7wjqcth rzwzr32hedq6 xn2b6rpftotj
3wikiogtpuvk dsajlbp2ry2o kkhizx2z6ea4 s3cudrw5vug6 y7igc5gkyxwd
3zdptwnkyltx e7jo56l7dj7s l4bo4nlbp4wq se43uhlf4tjo ydis3td427zv
47zq4zrngawy eus3hp732h5p loglud5apymb t5jlv37gxvrm yt6dksb4bzp3
4fgf2ogjhkd7 g5zxoyn62do6 migsq2gecx37 todmezptdgte ywrnwwtcdygj
4iliu4w2ix6z gc25gkuulirz olb7ljgn4wbu tqi64qtov54q zrh32h4azevd
72iy7qakrc2r gxfonbfmympc opqanpz7bwzd u43u27l7uyyd zunxx755a2ar
ajqoo7ualz25 heablz3wjtld otuwhghh4qqv ugbi2rtc5hxy
cw54a75gb7ad hfngjgdwgip4 pcso3baxrv5c vay33ptw4bl5
cxc5g2owcmrs i22r5zfknhaw q2pkrncp4yst vckm5fntplr3

of course, I don't remember the exact hash for my py34 profile.
Though, maybe we can load a profile using a hash, but then I need some
easy way to determine which hash goes with the py34 profile. For now I
can do:

$ ll ~/repos/hashstack/py34
lrwxrwxrwx 1 certik certik 38 Dec 10 12:23
/home/certik/repos/hashstack/py34 ->
/local/certik/bld/profile/3wikiogtpuvk/

So the hash is 3wikiogtpuvk. So I think that loading the profile in
the following way can be implemented with our current infrastructure:

$ hit load 3wikiogtpuvk
$ hit load -l 3wikiogtpuvk



But I want more. I want to avoid ever typing ~/repos/hashstack/.
That's just my directory where I happen to work on hashstack, but when
I use it, I want to use it systemwide (i.e. under my user, not root),
even if I delete this directory. Because "hit" is systemwide, i.e.
it's not tied to the ~/repos/hashstack/ at all.

Just like in git, you can reference commits by their hashes, we can
reference profiles by their hashes and that should work for sure. And
we should add machinery to hit to list these hashes, and list
information about them, i.e. which packages and versions, for example
for the profile 3wikiogtpuvk above, here is how to list the packages
in it:

$ cat /local/certik/bld/profile/3wikiogtpuvk/artifact.json
{
"dependencies" : [
"berkeleydb-5/ezpn7nogtzm4rt7hdhujxhn3o7nyhlaq",
"blas/xsvwemqbi4dtutp326zzjpoldt62lqpa",
"bzip2/yhn7t7sdxdfdkhyie6hg36hqvgccj7rj",
"cmake/ugrko5ribtzlnijeygdngjxzk774ea6w",
"cython/6zzo4i57ge2h2m6okivmmwlikqejlly7",
"docutils/bwc6pilnr3lmkpmm3fwa7kfzqgzosio7",
"freetype/3j5vs5xri6335xcnnxh7kt3iudaos7lp",
"gdbm/xqni5hcjx43andearscwqprb54wxnopq",
"ipython/lgca343h4psoy6deajt4p5plae4mejw3",
"jinja2/wmepqmliwuqblmzdltzzuss4yu4plvzh",
"lapack/enbo6mjc6pkeoaoxnykphtro67cnqwqe",
"launcher/wvmh5uvayzv3bxqk5kirztzn3rckjyv7",
"matplotlib/2tcnl243dxoqvm2mzodim7omtpid7j37",
"mpi/hbd47qrzmwyecvdbonfalbn5xtqurzi3",
"ncurses/viognkgus4njgat77hcakdedqccl3wjy",
"nose/nh7kevvepzhzkjivodknnrbdysqiyc4o",
"numpy/lb3nwf7tc4vwqlcee433yjmjfxrsvhme",
"openssl/m6pttxckahscjt6jdev5acigozwa6ths",
"pandas/ouqxw6s77qc4uygkqs5fs2fj3dkcarm2",
"patchelf/k3rloj265ogtl4dmmmbmyt34dnffryka",
"pcre/2dpe5reczy3rt2jpx33hs2v675tofarr",
"perl/xskj3irapqvxclfj5vnb4kgqcygpbpmt",
"pkg-config/pjksilnbb4iyvrsemnzshe2d3o7uwpa5",
"png/q24b4y6ojqdwk2nwnblutd4qrjhmlmev",
"pygments/eppsfejfjdhfyvwusvq6tkqi7x6ro4xc",
"pyparsing/wfnngy2kxji5dba6jcnpgabf6jpr6dau",
"python-dateutil/v6trkz2y2v6cbbfu2glzvziaoxg7h4dy",
"python/5lt333yeqc64s24mir2qrtg5s3llw6ww",
"pytz/ey3bxm54wpdusoeapi5nyu6aiqiree3p",
"pyzmq/6ck7aqm5gzvrn357lnwwizxwr2b7j5yk",
"readline/5tdfuoei3z6ektgh7v7lh3ra36s32ssp",
"scipy/r2e7sdsky7ubu33glon4fy7qbrvrldty",
"setuptools/pauvgsufg5xi3fa5cybjtxaptjad2lgs",
"six/klhtav5h3nv5jrvwjuemhrvenjebz3ba",
"sphinx/yzkjert2tpkdmnkhuygysqqyqv3wdxiv",
"sqlite/m5jo67qgu6zfrjydvg3fj3c5zvguflsx",
"swig/c7uttrdwujdi5rnzxnjoohdr5x6u2aqi",
"sympy/eqjdbrilgi6odzk5q4ubxwryeha65vje",
"tornado/5a4nt55hfj6i46ifnksxvfhm3zbdpjym",
"zlib/3el5ccejre7bcjqgld5gp6iym4ccd5oe",
"zmq/ezabxw2ecth4wxornymnwdgfkh2d6wi7"
],
"id" : "profile/3wikiogtpuvkg3e3wcff3ymp6ps43wfm",
"name" : "profile",
"version" : "n"
}

So let's add this (and format it nicer) as something like:

$ hit show 3wikiogtpuvk

Then I want to know which exact version of Python is used, so we need
something like:

$ hit show python/5lt333yeqc64s24mir2qrtg5s3llw6ww

currently I can do by hand:

$ grep "tar.gz" /local/certik/bld/python/5lt333yeqc64/build.json
"key" : "tar.gz:isr4d3y4psr6j7jfeqvpqdwxfwuucib4",

so at least I can see which tarball was used, so let's look into it:

$ file /local/certik/src/packs/tar.gz/isr4d3y4psr6j7jfeqvpqdwxfwuucib4
/local/certik/src/packs/tar.gz/isr4d3y4psr6j7jfeqvpqdwxfwuucib4: gzip
compressed data, from Unix, last modified: Wed Oct 8 02:24:41 2014,
max compression

Let's figure out which Python version that is:

$ tar tzf /local/certik/src/packs/tar.gz/isr4d3y4psr6j7jfeqvpqdwxfwuucib4
| head -n1
Python-3.4.2/

Ok, so now I know it's Python 3.4.2. All this is obviously cumbersome
to do by hand, but "hit" should provide nice interface on top of all
this. It looks like the hit database has lots of information that
would be very useful for users to start using.


However, in git it would be a big pain to only address patches by
their hashes. You can create branches and name them, essentially you
can name the hashes.

In the Hashstack directory, I actually have meaningful names, i.e.:

certik@redhawk:~/repos/hashstack(w)$ ll
total 192
lrwxrwxrwx 1 certik certik 38 Nov 7 17:33 basis ->
/local/certik/bld/profile/k4g2r7wjqcth/
lrwxrwxrwx 1 certik certik 38 Dec 9 16:38 default ->
/local/certik/bld/profile/x2yerpwugoig/
lrwxrwxrwx 1 certik certik 38 Oct 22 13:28 gtk ->
/local/certik/bld/profile/mgz74vgzjtdp
lrwxrwxrwx 1 certik certik 38 Nov 7 17:32 hd_base ->
/local/certik/bld/profile/d2nhm3s6qzwn/
...
lrwxrwxrwx 1 certik certik 38 Dec 10 12:13 py34 ->
/local/certik/bld/profile/2z6vwzjbxxkc/
...

But this is only available in the hashstack directory, the hit
database doesn't seem to know about the names like "py34", only the
hashes.

Would it be a problem for the hashdist database to also remember the
name of the profile? For example in:

$ cat /local/certik/bld/profile/3wikiogtpuvk/artifact.json
{
"dependencies" : [
"berkeleydb-5/ezpn7nogtzm4rt7hdhujxhn3o7nyhlaq",
"blas/xsvwemqbi4dtutp326zzjpoldt62lqpa",
...
"zmq/ezabxw2ecth4wxornymnwdgfkh2d6wi7"
],
"id" : "profile/3wikiogtpuvkg3e3wcff3ymp6ps43wfm",
"name" : "profile",
"version" : "n"
}

it has a "name" and the name is "profile". Why couldn't it be called
"py34" instead, by the name of the profile, i.e. py34.yaml in my
hashstack directory? Git allows more branch names to point to the same
commit, so maybe we can add "names" and have more names for the same
profile hash. For example I can create py34-scipy.yaml and play with
it, and not realize that I have (accidentally) created exactly the
same profile as py34.yaml, but I would still like to address it by
"py34-scipy", since that's what I am working with. But internally,
hashdist knows it's the same profile, so it can simply associate both
names with it.

Just like in git, you can delete branches. It actually doesn't delete
the commits, only the branch name. You can still recover the commits
from "git reflog". Eventually they get garbage collected, but that
takes like 2 weeks or so.

In the same way, I can could do:

hit delete py34

Which just deletes the name "py34" from the 3wikiogtpuvk profile.
Eventually, if all names are deleted, the profile can eventually get
garbage collected.


Let me know. These are things that are seriously missing and we need a
robust way to do all the things above.

Ondrej

Aron Ahmadia

unread,
Dec 10, 2014, 3:25:33 PM12/10/14
to Ondřej Čertík, hash...@googlegroups.com, Dag Sverre Seljebotn
Just to summarize your proposal (not in the same order):

* It would be awesome if we had some way to reference profiles by a name associated to them.  The default behavior when hit creates a profile should be to register that profile somewhere in its directory. 

* If we can reference built profiles by their names, we can easily do environment activations associated with that profile (setting two environment variables, possibly a third by command-line argument)

I'm +1 on both of these.  To follow Git's example, you would keep a directory of file pointers to profiles.  I think it would also be reasonable behavior to associate the profile's file name to its name, if the name doesn't already exist.


--
You received this message because you are subscribed to the Google Groups "hashdist" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hashdist+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jimmy Tang

unread,
Dec 11, 2014, 2:50:00 AM12/11/14
to hash...@googlegroups.com, ondrej...@gmail.com, d.s.se...@astro.uio.no, ar...@ahmadia.net
I'd agree with the summary and original post so +1 on the ideas.

In relation to environment activation, it would obviously need to be pretty low-level and basic, e.g. hashdist could return a keypair and it would be up to the user to parse and set the environments or generate environment modules etc...

Jimmy

Ondřej Čertík

unread,
Dec 12, 2014, 5:21:31 PM12/12/14
to Jimmy Tang, hash...@googlegroups.com, Dag Sverre Seljebotn, Aron Ahmadia
On Thu, Dec 11, 2014 at 12:50 AM, Jimmy Tang <jcf...@gmail.com> wrote:
> I'd agree with the summary and original post so +1 on the ideas.
>
> In relation to environment activation, it would obviously need to be pretty
> low-level and basic, e.g. hashdist could return a keypair and it would be up
> to the user to parse and set the environments or generate environment
> modules etc...

I agree that hashdist should allow this low-level api, so that the
user can use it in his higher level tools.

However, why are you against the following feature:

$ hit load py34

which will automatically execute:

$ export HASHSTACK=$HOME/repos/
hashstack/py34
$ export PATH=$HASHSTACK/bin:$PATH


This is what ultimately will make `hit` easy to use. Otherwise I need
to somehow execute things by hand again, which is a lot of pain.

Ondrej

Jimmy Tang

unread,
Dec 13, 2014, 3:34:18 AM12/13/14
to Ondřej Čertík, hash...@googlegroups.com, Dag Sverre Seljebotn, Aron Ahmadia
It does make it easier to use, but you will need to support the bash and csh based shells and potentially other odd ball shells that people might have, in principal I am not disagreeing, I just think starting of with a lower level API is more useful.

Jimmy 

Ondřej Čertík

unread,
Jan 8, 2015, 3:09:32 PM1/8/15
to Jimmy Tang, hash...@googlegroups.com, Dag Sverre Seljebotn, Aron Ahmadia
Good point. Honestly I don't think it's that hard to support various
shells, the syntax doesn't change, you just need to provide the syntax
for each shell, and there is only a few that people use the most.



Question: is it accurate to view hashstack as some kind of "source" to
help hashdist specify the packages, but once 'hit build' is run in the
hashstack directory, then hashdist knows *everything* that it needs to
know and the hashstack directory is no longer needed?


With this mindset, we just need to make sure hashdist (hit) knows
about the names of profiles and anything else it needs. And then we
need to improve 'hit' based on my emails above.

Question: how do you all use hashdist when you build a profille, i.e.
a link in your hashstack directory? How do you "activate" it?

I'll try get to implement the above soon. I just want to make sure
it's what we want.

Ondrej

Jimmy Tang

unread,
Jan 9, 2015, 5:14:41 AM1/9/15
to Ondřej Čertík, hash...@googlegroups.com, Dag Sverre Seljebotn, Aron Ahmadia
The way I use it is this...

- I build a profile and compile everything up in ~/develop/hashstack-private/SOMEPROFILE
- I then add a few export lines in my .bashrc for exporting PATH, PYTHONPATH and PKGCONFIG PATH

I'm pretty boring with my setup as it just works and I tend not to vary things too much, I do at times switch branches in my hashstack-private directory if I want to experiment with updating my profile

It's probably not the most clever way of setting things up, but it means less work for me as long as I remember to be on the right branch in my hashstack-private directory

Jimmy

Chris Kees

unread,
Jan 9, 2015, 10:40:36 AM1/9/15
to Jimmy Tang, Ondřej Čertík, hash...@googlegroups.com, Dag Sverre Seljebotn, Aron Ahmadia
+1 on these proposals. I'm building profiles in the directory of  the projects that depend on them and setting environment variables by hand.  When I need to change the stack for a particular  project I just delete the profile, change the stack, and rebuild, which is relatively fast due to the build artifact cache.

I agree that the  extra work  of detecting the shell and supporting the shell-specific  was of setting PATH/LD_LIBRARY_PATH is  worth the extra effort.

Chris

--

Dag Sverre Seljebotn

unread,
Jan 9, 2015, 2:05:31 PM1/9/15
to Ondřej Čertík, Jimmy Tang, hash...@googlegroups.com, Aron Ahmadia
This is at least how it should work.

> With this mindset, we just need to make sure hashdist (hit) knows
> about the names of profiles and anything else it needs. And then we
> need to improve 'hit' based on my emails above.
>
> Question: how do you all use hashdist when you build a profille, i.e.
> a link in your hashstack directory? How do you "activate" it?
>
> I'll try get to implement the above soon. I just want to make sure
> it's what we want.

+1. As a first iteration I'd suggest something like

$ hit env --shell=bash myprofile

which simply prints to stdout:

export PATH="/home/dagss/myprofile/bin:${PATH}"

Then one would do this to import it into an environment:

source <(hit env --shell=bash myprofile)

Second iteration is shell auto-detection, third iteration perhaps
another command to spawn a sub-shell (that command could be "hit load",
or perhaps "hit use" ... nothing is loaded really, so..).

Creating a command that modifies your current environment (like
"modules" on clusters) is quite invasive (hit would need to be a shell
script function I believe...)

A question is what to export though. In a perfect world all that is
needed is to patch PATH and both PYTHONPATH and LD_LIBRARY_PATH etc. are
not needed if you, but we're probably not there yet... in which case I
prefer some JSON file inside the profile listing the variables to set?
Hard-coding PYTHONPATH and LD_LIBRARY_PATH in the `hit` tool wouldn't be
nice, that's something that should be specified in Hashstack packages
somehow.

Dag Sverre

Ondřej Čertík

unread,
Jan 9, 2015, 3:08:16 PM1/9/15
to Dag Sverre Seljebotn, Jimmy Tang, hash...@googlegroups.com, Aron Ahmadia
Thanks. I just realized this design yesterday.

>> With this mindset, we just need to make sure hashdist (hit) knows
>> about the names of profiles and anything else it needs. And then we
>> need to improve 'hit' based on my emails above.
>>
>> Question: how do you all use hashdist when you build a profille, i.e.
>> a link in your hashstack directory? How do you "activate" it?
>>
>> I'll try get to implement the above soon. I just want to make sure
>> it's what we want.
>
>
> +1. As a first iteration I'd suggest something like
>
> $ hit env --shell=bash myprofile
>
> which simply prints to stdout:
>
> export PATH="/home/dagss/myprofile/bin:${PATH}"
>
> Then one would do this to import it into an environment:
>
> source <(hit env --shell=bash myprofile)
>
> Second iteration is shell auto-detection, third iteration perhaps another
> command to spawn a sub-shell (that command could be "hit load", or perhaps
> "hit use" ... nothing is loaded really, so..).

Agreed.

>
> Creating a command that modifies your current environment (like "modules" on
> clusters) is quite invasive (hit would need to be a shell script function I
> believe...)

I think that's right, didn't think about that.

>
> A question is what to export though. In a perfect world all that is needed
> is to patch PATH and both PYTHONPATH and LD_LIBRARY_PATH etc. are not needed
> if you, but we're probably not there yet... in which case I prefer some JSON
> file inside the profile listing the variables to set? Hard-coding PYTHONPATH
> and LD_LIBRARY_PATH in the `hit` tool wouldn't be nice, that's something
> that should be specified in Hashstack packages somehow.

For my use cases, I found out that I need to setup:

* PATH (always)
* LD_LIBRARY_PATH (essentially only if I want to use gcc from the
profile, otherwise it's not needed --- maybe we'll be able to figure
out how to patch gcc, so that this is not needed, that would be great)
* HASHSTACK --- this is a path to the profile, so that when I build my
projects by hand, I can point cmake to libraries installed in my
profile. Essentially, using your example above,
HASHSTACK=/home/dagss/myprofile and PATH=$HASHSTACK/bin:$PATH. The
LD_LIBRARY_PATH=$HASHSTACK/lib.

I never needed to set PYTHONPATH. So I think we can just export this
directly without JSON.

Ondrej

Ondřej Čertík

unread,
Apr 22, 2015, 3:36:09 PM4/22/15
to hash...@googlegroups.com
Hi,

I've implemented all these in
https://github.com/hashdist/hashdist/pull/325. Below I show how to use
it:
certik@redhawk:~$ hit list-profiles
List of installed profiles (profile_name@profile_hash):
basis@k4g2r7wjqcth
truchas@ws2qspqle76r
py32@loglud5apymb
py3@DELETED
py34@htvz74mbenv5
xx@d34x65pdbsxt
hd_base@bto56eplrk4v
b@DELETED
test@vndgqtgsmdth
rhel6@5vq4exlhbnuj
python3.4.linux@yt6dksb4bzp3
temp@ehcfjfeibvh5
gtk@mgz74vgzjtdp
py33@opqanpz7bwzd
default@lcbhm255ymgh
py27@pcnu7kh736at
numpy@kkhizx2z6ea4
truchas_bootstrap@frxnfq2suzik


>
>
> I manually pruned the output of "ls" to only show the links to
> profiles. I counted 13 of them.
> A collaborator just sent me his Python 3 files (some Python modules
> and IPython notebook). I want to try them out. So I go to /tmp/x
> directory with the files, and:
>
> certik@redhawk:/tmp/x$ export HASHSTACK=$HOME/repos/hashstack/py34
> certik@redhawk:/tmp/x$ export PATH=$HASHSTACK/bin:$PATH

certik@redhawk:~$ hit load py34
export HASHSTACK="/local/certik/bld/profile/htvz74mbenv5"
export PATH="${HASHSTACK}/bin":${PATH}
echo "Exporting HASHSTACK=$HASHSTACK"
echo "Adding \${HASHSTACK}/bin to PATH"
echo "Profile py34@htvz74mbenv5 loaded."

# To load the py34@htvz74mbenv5 profile, execute in Bash:
# . <(hit load py34)



certik@redhawk:~$ . <(hit load py34)
Exporting HASHSTACK=/local/certik/bld/profile/htvz74mbenv5
Adding ${HASHSTACK}/bin to PATH
Profile py34@htvz74mbenv5 loaded.


>
> I can test that things work:
>
> certik@redhawk:/tmp/x$ which python
> /home/certik/repos/hashstack/py34/bin/python
> certik@redhawk:/tmp/x$ which ipython
> /home/certik/repos/hashstack/py34/bin/ipython

certik@redhawk:~$ which python
/local/certik/bld/profile/htvz74mbenv5/bin/python
certik@redhawk:~$ which ipython
/local/certik/bld/profile/htvz74mbenv5/bin/ipython


>
> Now I can fire up the notebook and start playing with the code:
>
> certik@redhawk:/tmp/x$ ipython notebook
>
> or executing some scripts like this:
>
> certik@redhawk:/tmp/x$ python eq.py
>
> or run tests like:
>
> certik@redhawk:/tmp/x$ nosetests
>
> or compile my own package, for example csympy, using:
>
> CC=gcc CXX=g++ cmake -DCOMMON_DIR=$HASHSTACK .
>
> You can see that I can just use $HASHSTACK to point to my profile so
> that the cmake can find all the packages it needs. (The CC and CXX
> variables must be explicitly set to use Hashstack's gcc 4.9.2
> compiler.)
>
> But every time I open a new terminal, I have to execute these two lines by hand:
>
> certik@redhawk:/tmp/x$ export HASHSTACK=$HOME/repos/hashstack/py34
> certik@redhawk:/tmp/x$ export PATH=$HASHSTACK/bin:$PATH


Not any more, see above.

>
> And sometimes, when I compile other projects with our own gcc, I also
> need to do:
>
> export LD_LIBRARY_PATH=$HASHSTACK/lib
>
> The reason is that gcc creates an executable and links to its own
> libraries from $HASHSTACK/lib, but it doesn't set rpath by default.
> Sometimes cmake sets rpath to $HASHSTACK/lib, for example for csympy,
> so then things work. But if it's some other project, it usually
> doesn't work, so one either has to set LD_LIBRARY_PATH, which is
> typically what the "module" system does on clusters when you load
> compilers like gcc or Intel, or set rpath by hand in the third party
> project, which typically requires some non-trivial modifications to
> its build system.

LD_LIBRARY_PATH could be also set by the above easily, but it is
actually not necessary even for gcc, we fixed it since then.

>
> Anyway, so now you see the whole motivation. We need some simple way
> to load the "py34" profile, something like this: I open a new
> terminal, go to some directory and do:
>
> $ hit load py34
>
> which will automatically execute:
>
> $ export HASHSTACK=$HOME/repos/hashstack/py34
> $ export PATH=$HASHSTACK/bin:$PATH
>
> or I can do:
>
> $ hit load -l py34
>
> which will execute:
>
> $ export HASHSTACK=$HOME/repos/hashstack/py34
> $ export PATH=$HASHSTACK/bin:$PATH
> $ export LD_LIBRARY_PATH=$HASHSTACK/lib
>
> I think that "-l" should not be the default, as the default should be
> to only add the profile's bin into your PATH and set the $HASHSTACK
> variable, so that you can use it when compiling things by hand.

All fixed.
Fixed.
Done:

certik@redhawk:~$ hit show py34
Information about the py34@htvz74mbenv5 profile:
Path: /local/certik/bld/profile/htvz74mbenv5
Full profile hash: htvz74mbenv5rlrhrrlb2rjybg3ykzzc
List of packages:
berkeleydb-5@ezpn7nogtzm4rt7hdhujxhn3o7nyhlaq
blas@xsvwemqbi4dtutp326zzjpoldt62lqpa
bzip2@yhn7t7sdxdfdkhyie6hg36hqvgccj7rj
cmake@ugrko5ribtzlnijeygdngjxzk774ea6w
cython@6zzo4i57ge2h2m6okivmmwlikqejlly7
docutils@bwc6pilnr3lmkpmm3fwa7kfzqgzosio7
freetype@3j5vs5xri6335xcnnxh7kt3iudaos7lp
gdbm@xqni5hcjx43andearscwqprb54wxnopq
ipython@lgca343h4psoy6deajt4p5plae4mejw3
jinja2@wmepqmliwuqblmzdltzzuss4yu4plvzh
lapack@enbo6mjc6pkeoaoxnykphtro67cnqwqe
launcher@wvmh5uvayzv3bxqk5kirztzn3rckjyv7
line_profiler@4gmcu554pv3pkloryw2k7nbrzrakjq2m
matplotlib@2tcnl243dxoqvm2mzodim7omtpid7j37
mpi@hbd47qrzmwyecvdbonfalbn5xtqurzi3
ncurses@viognkgus4njgat77hcakdedqccl3wjy
nose@nh7kevvepzhzkjivodknnrbdysqiyc4o
numpy@lb3nwf7tc4vwqlcee433yjmjfxrsvhme
openssl@m6pttxckahscjt6jdev5acigozwa6ths
pandas@ouqxw6s77qc4uygkqs5fs2fj3dkcarm2
patchelf@k3rloj265ogtl4dmmmbmyt34dnffryka
pcre@2dpe5reczy3rt2jpx33hs2v675tofarr
perl@xskj3irapqvxclfj5vnb4kgqcygpbpmt
pkg-config@pjksilnbb4iyvrsemnzshe2d3o7uwpa5
png@q24b4y6ojqdwk2nwnblutd4qrjhmlmev
pygments@eppsfejfjdhfyvwusvq6tkqi7x6ro4xc
pyparsing@wfnngy2kxji5dba6jcnpgabf6jpr6dau
python-dateutil@v6trkz2y2v6cbbfu2glzvziaoxg7h4dy
python@5lt333yeqc64s24mir2qrtg5s3llw6ww
pytz@ey3bxm54wpdusoeapi5nyu6aiqiree3p
pyzmq@6ck7aqm5gzvrn357lnwwizxwr2b7j5yk
readline@5tdfuoei3z6ektgh7v7lh3ra36s32ssp
scipy@r2e7sdsky7ubu33glon4fy7qbrvrldty
setuptools@pauvgsufg5xi3fa5cybjtxaptjad2lgs
six@klhtav5h3nv5jrvwjuemhrvenjebz3ba
sphinx@yzkjert2tpkdmnkhuygysqqyqv3wdxiv
sqlite@m5jo67qgu6zfrjydvg3fj3c5zvguflsx
swig@c7uttrdwujdi5rnzxnjoohdr5x6u2aqi
sympy@6vh445eecxwvkyi6rclnt42oksnifkfr
tornado@5a4nt55hfj6i46ifnksxvfhm3zbdpjym
yaml@vxeiq47kjbzo5lcw5bpz4nn4vrfscoet
zlib@3el5ccejre7bcjqgld5gp6iym4ccd5oe
zmq@ezabxw2ecth4wxornymnwdgfkh2d6wi7


>
> Then I want to know which exact version of Python is used, so we need
> something like:
>
> $ hit show python/5lt333yeqc64s24mir2qrtg5s3llw6ww

Done:

certik@redhawk:~$ hit show-package python@5lt333yeqc64s24mir2qrtg5s3llw6ww
Information about the python@5lt333yeqc64 package:
Path: /local/certik/bld/python/5lt333yeqc64
Full package hash: 5lt333yeqc64s24mir2qrtg5s3llw6ww
List of sources:
files:mmrpielmmakqy7nhexzkepv3tgn6g5sm
tar.gz:isr4d3y4psr6j7jfeqvpqdwxfwuucib4
List of dependencies:
bzip2@yhn7t7sdxdfdkhyie6hg36hqvgccj7rj
launcher@wvmh5uvayzv3bxqk5kirztzn3rckjyv7
ncurses@viognkgus4njgat77hcakdedqccl3wjy
openssl@m6pttxckahscjt6jdev5acigozwa6ths
patchelf@k3rloj265ogtl4dmmmbmyt34dnffryka
readline@5tdfuoei3z6ektgh7v7lh3ra36s32ssp
sqlite@m5jo67qgu6zfrjydvg3fj3c5zvguflsx
zlib@3el5ccejre7bcjqgld5gp6iym4ccd5oe


>
> currently I can do by hand:
>
> $ grep "tar.gz" /local/certik/bld/python/5lt333yeqc64/build.json
> "key" : "tar.gz:isr4d3y4psr6j7jfeqvpqdwxfwuucib4",
>
> so at least I can see which tarball was used, so let's look into it:
>
> $ file /local/certik/src/packs/tar.gz/isr4d3y4psr6j7jfeqvpqdwxfwuucib4
> /local/certik/src/packs/tar.gz/isr4d3y4psr6j7jfeqvpqdwxfwuucib4: gzip
> compressed data, from Unix, last modified: Wed Oct 8 02:24:41 2014,
> max compression
>
> Let's figure out which Python version that is:
>
> $ tar tzf /local/certik/src/packs/tar.gz/isr4d3y4psr6j7jfeqvpqdwxfwuucib4
> | head -n1
> Python-3.4.2/

The above shows the source, so one can still do this, but for versions
it's better to add a 'version' field into the yaml files. I've created
a new issue for this: https://github.com/hashdist/hashdist/issues/327

>
> Ok, so now I know it's Python 3.4.2. All this is obviously cumbersome
> to do by hand, but "hit" should provide nice interface on top of all
> this. It looks like the hit database has lots of information that
> would be very useful for users to start using.

Fixed.
It turns out the the "gc_roots" directory has the name of the
branches, so the above PR uses it. But a proper fix is to include it
in the artifact.json instead.

>
> Just like in git, you can delete branches. It actually doesn't delete
> the commits, only the branch name. You can still recover the commits
> from "git reflog". Eventually they get garbage collected, but that
> takes like 2 weeks or so.
>
> In the same way, I can could do:
>
> hit delete py34
>
> Which just deletes the name "py34" from the 3wikiogtpuvk profile.
> Eventually, if all names are deleted, the profile can eventually get
> garbage collected.
>
>
> Let me know. These are things that are seriously missing and we need a
> robust way to do all the things above.

Most of this is now implemented, now we just need to polish it and
keep improving it.

Ondrej

Aron Ahmadia

unread,
Apr 22, 2015, 3:55:27 PM4/22/15
to Ondřej Čertík, hash...@googlegroups.com
Wow. This looks awesome. I am sorry I'm on the sidelines right now :(

Jimmy Tang

unread,
Apr 23, 2015, 2:41:21 AM4/23/15
to hash...@googlegroups.com
Great stuff!
Reply all
Reply to author
Forward
0 new messages