HDEP 3

39 views
Skip to first unread message

Dag Sverre Seljebotn

unread,
Apr 2, 2015, 4:23:10 PM4/2/15
to hash...@googlegroups.com, mark florisson
So when me and Johannes and Mark met at Simula there were loads of
thoughts floating around, but here's a distillation of some of them:

https://github.com/hashdist/hashdist/wiki/HDEP-3:-Contracts

The goal is to clean up the parameter/profile system a bit (both
semantics and the Hashdist code) and we need to start in one corner or
the other. Comments both on technical merit and relevance welcome.

Dag Sverre

Chris Kees

unread,
Apr 6, 2015, 11:44:29 PM4/6/15
to Dag Sverre Seljebotn, hash...@googlegroups.com, mark florisson
Thanks for posting that. It seems like a good idea to me. I guess the contract will allow us to write a kind of "abstract base class" for a class of packages and the constraints and parameters allow ways to resolve to a  concrete package automatically. 

For the case of reusing stack specs across machines, can you explain how it's going to fulfill contracts? I guess the idea is to have a package spec like

host-acml.yaml
-------------------
- when: machine=='CrayXE6'
  priority: 100

and then an algorithm picks the highest ranked package fulfilling the blas contract according to priorities. Would there also be a way to hardwire the choice  for a machine/platform while still using the platform independent stack spec, or would  that always be done indirectly through priorities?  It seems like initially we'll want to support something completely deterministic like "strong advice" on how to fullfull the contract:

blas:
 -when: machine=='CrayXE6'
  use: host-acml
 -when: machine=='SGIICEX'
  use: mkl

It would also be nice to make an "explicit" version of the stack spec available for reproducibility after all contracts have been fulfilled.  Meaning a stack spec where all contract dependencies are passed as parameters. I'd like to preserve and easy way to link a git commit with an explicit/reproducible stack spec.

Chris

Dag Sverre Seljebotn

unread,
Apr 7, 2015, 12:16:24 PM4/7/15
to Chris Kees, hash...@googlegroups.com, mark florisson
Hi Chris, thanks for the feedback!

On 04/07/2015 05:44 AM, Chris Kees wrote:
> Thanks for posting that. It seems like a good idea to me. I guess the
> contract will allow us to write a kind of "abstract base class" for a
> class of packages and the constraints and parameters allow ways to
> resolve to a concrete package automatically.

That's a good way of putting it.

> For the case of reusing stack specs across machines, can you explain how
> it's going to fulfill contracts? I guess the idea is to have a package
> spec like
>
> host-acml.yaml
> -------------------
> - when: machine=='CrayXE6'
> priority: 100

If "CrayXE6" is a really specific setup (perhaps it's even DoD's CrayXE6
or Chris' CrayXE6?) then I think we should find a way to allow you to
only everything that is specific to CrayXE6 in a single "mixin" file. So
IMO we want something on the profile spec level, not the recipe level.
(We could add support for when-clauses in profile specs too).

> indirectly through priorities? It seems like initially we'll want to
> support something completely deterministic like "strong advice" on how
> to fullfull the contract:
>
> blas:
> -when: machine=='CrayXE6'
> use: host-acml
> -when: machine=='SGIICEX'
> use: mkl

Ah right. The spec covers strong advice for specific packages (you
should be able to do something like the above for NumPy only, then
repeat for FEnICS only, etc.) but the spec doesn't have a "strong advice
for default".

The syntax above would work, but only adds this feature for "package
parameters". What if you want to change other defaults such as
optimization level?

So how about putting that in our "parameters" section? You just set the
"blas" parameters thusly:

parameters:
when machine == 'CrayXE6':
blas: host-acml
optlevel: -O0
when machine=='SGIICEX'
blas: mkl

> It would also be nice to make an "explicit" version of the stack spec
> available for reproducibility after all contracts have been fulfilled.
> Meaning a stack spec where all contract dependencies are passed as
> parameters. I'd like to preserve and easy way to link a git commit with
> an explicit/reproducible stack spec.

For reproducability I think I'd prefer (if possible) making the
resolution algorithm very deterministic so that you always end up with
the same stack given same parameters, then you really can link to a git
commit with an "explicit" stack, there's just some auto-resolution
"syntax candy".

But if this is too difficult to pull off with SAT solvers and whatnot,
and Hashdist needs to add some basically random element to the packages
it chooses, then what you say is needed for reproducability.

Anyway, dumping the intermediate resolved stack representation to file
would be very useful for debugging, so what you ask for should probably
be supported anyway, it's just I ideally want to make that unnecessary.


Dag Sverre

Dag Sverre Seljebotn

unread,
Apr 7, 2015, 2:11:24 PM4/7/15
to hash...@googlegroups.com
On 04/07/2015 06:16 PM, Dag Sverre Seljebotn wrote:
> Hi Chris, thanks for the feedback!
>
> On 04/07/2015 05:44 AM, Chris Kees wrote:
>> Thanks for posting that. It seems like a good idea to me. I guess the
>> contract will allow us to write a kind of "abstract base class" for a
>> class of packages and the constraints and parameters allow ways to
>> resolve to a concrete package automatically.
>
> That's a good way of putting it.

And to complete the summary, the other main point of the proposal is
passing dependencies of a package (that are abstract/contracts) into the
package build as parameters.

Dag Sverre

Dag Sverre Seljebotn

unread,
Apr 24, 2015, 9:12:59 AM4/24/15
to hash...@googlegroups.com
FYI, I'll start coding some weeks on Hashdist early may. I think it
still makes sense to move forward on HDEP 3; several of the features
Ondrej reports as missing in Hashdist (version and compiler) will be
solved by working on this part of the Hashdist code base. The big
difference is rather than hard-coding version and compiler as special
properties, we open up for defining ones own ("ccompiler" is one
contract, "gpucompiler" another, and so on).

I'm considering making the contract spec format Python code, opinions on
that? I sort of liked the part of Spack with using nice clean
descriptions in Python by writing a class, and wouldn't mind going more
in that direction. I guess if we do that it should sort of imply opening
up for the possibility of more pure-Python package specs, without any
specific timeline and always being backwards compatible with the YAML
files..

So the example would look something like

from hashdist import *

class blas(contract):
multithreading = parameter(bool)
efficiency = parameter(int)

exports_build_env_vars = ['BLAS']

@check # Probably not implemented at first but this is what it
# could look like
def check_pkg_config(self, param_values):
"""
A package implementing BLAS should
"""
# This is run during Hashdist build process to verify that
# packages implement the contract
subprocess.check_call('pkg-config', 'blas')

And for compile

class ccompiler(contract):
"""
This contract basically just declares the 'ccompiler' parameter on
packages that have the contract as a dependency.
"""
exports_build_env_vars = ['CC']



Dag Sverre

On 04/07/2015 06:16 PM, Dag Sverre Seljebotn wrote:

Ondřej Čertík

unread,
Apr 25, 2015, 12:15:44 PM4/25/15
to Dag Sverre Seljebotn, hash...@googlegroups.com
I actually like that we have a domain specific language using the yaml
files. I don't want Hashstack to become like SCons, where people just
keep hacking in Python, instead of implementing/improving the
underlying support in SCons (as opposed to CMake, which forces people
to hack less, though people still hack CMake if need to) --- in other
words, I want less .py files in Hashstack, not more. Each .py file in
Hashstack means that the underlying support in Hashdist is missing
some feature, so one has to hack around using .py. We should still
allow that, but it should be used as a temporary workaround, until we
implement better support in Hashdist itself.

In my understanding, this was our design choice at the beginning. I am
open to changing direction of Hashdist, but in that case I would like
to better understand our initial reasoning, why we chose to use yaml
files instead of just Python files, and then what made us change this
to actually just use Python files.

Ondrej

Dag Sverre Seljebotn

unread,
May 28, 2015, 2:18:37 PM5/28/15
to Ondřej Čertík, hash...@googlegroups.com
(This is motivated by my proposal to make it mandatory that packages
have Python identifiers as names, see
https://github.com/hashdist/hashstack/issues/796)

I am fairly certain contracts will be YAML now, and also the parameter
refactoring turned out to be a lot more important than contracts,
contracts were just a guiding star.. but we should revisit the language
part of this discussion:


On 04/25/2015 06:15 PM, Ondřej Čertík wrote:
> On Fri, Apr 24, 2015 at 7:12 AM, Dag Sverre Seljebotn
> <d.s.se...@astro.uio.no> wrote:
> I actually like that we have a domain specific language using the yaml
> files. I don't want Hashstack to become like SCons, where people just
> keep hacking in Python, instead of implementing/improving the
> underlying support in SCons (as opposed to CMake, which forces people
> to hack less, though people still hack CMake if need to) --- in other
> words, I want less .py files in Hashstack, not more. Each .py file in
> Hashstack means that the underlying support in Hashdist is missing
> some feature, so one has to hack around using .py. We should still
> allow that, but it should be used as a temporary workaround, until we
> implement better support in Hashdist itself.

This is all IMO of course..

I don't know if the reason was very well motivated; somehow it was a lot
easier to think in YAML than in Python for what we wanted to do,
especially since Python is not strictly functional language, while our
YAML specs are. There are many ways of doing it in Python that were
never visited though, so YAML was more like the quick and dirty hack
that allowed us to not think too much.

I'm fairly certain that it was never the intention in that workshop that
Python hooks were "temporary workarounds" until the YAML were powerful
enough. That would mean developing our own full-fledged programming
language. How is that better than using one of the existing ones?

http://xkcd.com/927/

That said, the hook API should of course be improved. And perhaps there
could even be some degree of language-agnosticism in that you could spec
one package using Python and one using Julia and one using Ruby. But I'm
no fan of making our own. (So, e.g., in the YAML file it could say
`expr_language: julia`, and now everything in {{ }} would be Julia
instead of Python.)

Basically Hashdist is a programmatic way of generating stacks; our
Hashstack is a software program. That is really more than what can be
said about Debian/RedHat etc. where the only programmatic element is in
the build phase, not in the specification phase.

Therefore every time we add features we take steps to make Hashdist
closer to a real programming language...

Nix and Guix and Spack all does this too, and they all use proper
programming languages (its own complete, Scheme, and Python, respectively).

I think in isolation, Scheme or Haskell might have been a better
foundation. Julia could be good but is a monstrous dependency to have.
Python has a nice, relatively easily compiled interpreter and is well
known in scientific community and has a C/JavaScript-like syntax that.

So I don't mind us tying our knot more tightly with Python in general.
YAML is nice because things that would be very verbose and not very
idiomatic in Python can be captured quite nicely, but we *do* need hooks
for eternity IMO, since the alternative is making our own programming
language in our YAML files.


Dag Sverre
Reply all
Reply to author
Forward
0 new messages