Community build service (starting with Windows)

590 views
Skip to first unread message

Travis Oliphant

unread,
Nov 11, 2013, 6:31:12 PM11/11/13
to numf...@googlegroups.com, cgo...@uci.edu
Hey all, 

Building for Windows is a pain in the neck as most who have had to do it can at-test.   Companies like Continuum and Enthought typically have to dedicate full-time resources to it to make sure it happens for their distributions.   Christopher Gohlke has done an amazing service with his Windows installers.  

In a recent Twitter conversation, Wes McKinney emphasized how difficult it is for pandas, for example to continue to provide windows binaries for their project.  I have seen the same problem for NumPy and SciPy and I'm quite confident that other projects feel the same pain.   The move to Python 2 *and* 3 will only amplify that pain for every open source developer shipping a project. 

The time is right to solve this problem.  I think we can use NumFOCUS as a coordinating organization to work with companies like Rackspace, Intel, and Microsoft and community leaders to provide a binary build service for all the NumPy-stack projects.  I think we can use the public conda recipes for creating Windows packages and Continuum is willing to provide any insight we have from our current Anaconda recipes. 

At it's core, this means at least: 

1) Provisioned machines for building on at least Windows, Mac, and Linux (but starting with Windows):
2) Public and available Recipes to build on major platforms.  
3) Good Fortran/C/C++ compilers available on these machines. 

The goal would be to allow developers to get "mostly automatic" builds on all platforms.   Some tweaking would be necessary to really get things to work, but between all the people already involved we could create "standard build recipes" for all packages and then allow access to build dev machines to help debug the trickier builds.  

One thing that has to be resolved of course is "what to build": 

I would strongly encourage and support building conda packages.  We have already started some work on a build service to create conda packages on other platforms for those uploaded for one platform.   The intent is for this service to be free to open source projects but we would be happy for a NumFOCUS-supported program to exist.  Our reason for doing this is to help the community. 

We may need to be able to generate .whl packages as well  (conda allows packaging things whl was not intended for --- python itself, for example, but also C and C++-libraries that are needed for many scientific packages).  Regardless, it is straightforward to produce .whl packages from conda binary packages, so if we get conda packages built, then .whl packages are easy to make too from that.

We may also need to generate "windows executable installers" though I'd really like to deprecate those as I don't think they are the right answer for Windows. Conda (and even pip with .whl files) will be a much better approach for Windows users. 

binstar.org is already available as a binary artifact repository for use by the system.  

We need someone to head this up.   If you are interested, please respond either publicly on this list or to ad...@numfocus.org or to me personally if you would like discretion.   Perhaps NumFOCUS could buy someone's teaching time from them. 

We can work with vendors to get machine time and necessary compilers and I'm sure once we advertise this program we can get money from industry as well.   But, we will need resources for the person to head this up.   

I estimate the cost of this as at least $80k to get started (mostly in the time of some-one heading this up) and about $20k / year in on-going costs.   I would welcome other estimates.     I suspect I'm being too lean.    

Continuum cannot foot the entire bill for this beyond what we have already done in creating conda and Anaconda --- and I suspect nobody else can either.    But, together we can pull this off as a community. 

Best,

-Travis

For those needing a refresher on just how much conda does for this problem, here are my slides on conda presented at the recent PyData Conference:  https://speakerdeck.com/teoliphant/packaging-and-deployment-with-conda



David Cournapeau

unread,
Nov 11, 2013, 7:14:15 PM11/11/13
to numf...@googlegroups.com, cgo...@uci.edu
As I mentioned on twitter, travis-ci is close to an ideal solution.

It works extremely well on linux, the remaining issues are mostly resources, and any other system will have the same issue. They have already figured out lots of issues like how to avoid using too much resources on each job slave (an issue plaguing Jenkins-based systems), the distributed aspect of it (distributed logging, etc...) and the re-provisioning of the slaves through chef (https://github.com/travis-ci/travis-cookbooks).

Making this work on windows *reliably* in a reproducible way is a lot of work, and I suspect travis-ci is mostly there now. I myself have some proof of concept of a completely automated setup of packer + vagrant + chef to setup a VM with mingw, and there is no  reason that could not work on travis-ci (which used to be based on vagrant, but it looks like they changed).

This setup also allows for matching up remote and local setup, which is crucial when debugging. Trying to reinvent the wheel there for open source projects is a mistake IMO, because solving this at a non trivial scale requires lots of sys admin skills, which we culturally lack (to get an idea of the problems involved, people can take a look at the travis-ci blog: http://about.travis-ci.org/blog/).

My 2 cents as they say,
David


--
You received this message because you are subscribed to the Google Groups "NumFOCUS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numfocus+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Travis Oliphant

unread,
Nov 11, 2013, 7:29:57 PM11/11/13
to numf...@googlegroups.com, Christoph Gohlke
I appreciate this input,  but I don't think we should just wait for Travis-CI to solve this unless they are directly a part of this conversation and we have some control over the solution they provide.   

There are many approaches to solving these issues (why not ansible or puppet instead of chef), and the needs of NumFOCUS community are specific enough that it deserves it's own solution. 

We also have great support from Microsoft who has resources available to provide to us already provisioned Windows VMs with Visual Studio.   

Matching remote and local setup is really quite trivial with conda and standard VMs (we do this all the time).  

And I don't think we lack the needed skills.  We actually have them in spades just spread across multiple organizations and a lack of money.   If someone donated $100k we at Continuum could do this in 4 months --- we have all the people we need to do it.   Rackspace, and RedHat also have these skills and it is not difficult to get their help.   We just have to have a clear goal in mind

We really just resources raised --- because with the money we have people who could do it.

-Travis
 
--

Travis Oliphant
CEO
Continuum Analytics, Inc.

David Cournapeau

unread,
Nov 11, 2013, 7:48:26 PM11/11/13
to numf...@googlegroups.com, Christoph Gohlke
On Tue, Nov 12, 2013 at 12:29 AM, Travis Oliphant <tra...@continuum.io> wrote:
I appreciate this input,  but I don't think we should just wait for Travis-CI to solve this unless they are directly a part of this conversation and we have some control over the solution they provide.   

There are many approaches to solving these issues (why not ansible or puppet instead of chef), and the needs of NumFOCUS community are specific enough that it deserves it's own solution. 

Ansible does not work on windows, and puppet lacks many basic features (I actually don't like chef very much, but that's the one which works the best today for windows).

What is specific to NumFOCUS that is not solved by travis-ci ? Most scipy projects already use travis-ci effectively (numpy, scipy, sklearn and pandas do at least), but it lacks os x and windows support. OS X is a function of money I suspect (since that's supported by travis-ci already), and travis-ci has mentioned on ML they welcome help to make windows happen.
 

We also have great support from Microsoft who has resources available to provide to us already provisioned Windows VMs with Visual Studio.   

Matching remote and local setup is really quite trivial with conda and standard VMs (we do this all the time). 

I am still not sure to understand the link with conda. Building numpy, scipy, etc... don't require any 'package' beyond themselves. The pb is setting up the dev environment (compiler, runtime, etc...). If travis-ci worked on windows, and the numpy/scipy/etc... travis config could push wheels somewhere, making scikit learn, pandas, etc... easy to build would be fairly easy.

David 

Aaron Meurer

unread,
Nov 11, 2013, 7:57:42 PM11/11/13
to numf...@googlegroups.com, Christoph Gohlke
On Mon, Nov 11, 2013 at 5:48 PM, David Cournapeau <cour...@gmail.com> wrote:
>
>
>
> On Tue, Nov 12, 2013 at 12:29 AM, Travis Oliphant <tra...@continuum.io>
> wrote:
>>
>> I appreciate this input, but I don't think we should just wait for
>> Travis-CI to solve this unless they are directly a part of this conversation
>> and we have some control over the solution they provide.
>>
>> There are many approaches to solving these issues (why not ansible or
>> puppet instead of chef), and the needs of NumFOCUS community are specific
>> enough that it deserves it's own solution.
>
>
> Ansible does not work on windows, and puppet lacks many basic features (I
> actually don't like chef very much, but that's the one which works the best
> today for windows).
>
> What is specific to NumFOCUS that is not solved by travis-ci ? Most scipy
> projects already use travis-ci effectively (numpy, scipy, sklearn and pandas
> do at least), but it lacks os x and windows support. OS X is a function of
> money I suspect (since that's supported by travis-ci already), and travis-ci
> has mentioned on ML they welcome help to make windows happen.

Travis in many ways views Python as second class (over more popular
langauages (for them) like Ruby). As a simple example, if you want to
test a package that requires matplotlib, scipy, etc., you have to
install from source, because they don't have binary packages in their
repos for all the Python versions (another, better option is to just
skip their stuff and use miniconda). Also consider this issue:
https://github.com/travis-ci/travis-ci/issues/1106. Travis CI *still*
hasn't updated PyPy from 1.9, even though 2.0 was released in May and
indeed 2.1 has been out since August. For SymPy, we have had to leave
PyPy in the expected fail part of the build matrix since several bugs
in PyPy 1.9 cause the tests there to always fail.

It also took them forever to release Python 3.3 as I recall.

I'm all for using Travis CI if it can work, but I agree with Travis
(Oliphant) that unless we have full support from their community, it
will be a wasted effort.

Aaron Meurer

David Cournapeau

unread,
Nov 11, 2013, 8:05:04 PM11/11/13
to numf...@googlegroups.com, Christoph Gohlke
On Tue, Nov 12, 2013 at 12:57 AM, Aaron Meurer <asme...@gmail.com> wrote:
On Mon, Nov 11, 2013 at 5:48 PM, David Cournapeau <cour...@gmail.com> wrote:
>
>
>
> On Tue, Nov 12, 2013 at 12:29 AM, Travis Oliphant <tra...@continuum.io>
> wrote:
>>
>> I appreciate this input,  but I don't think we should just wait for
>> Travis-CI to solve this unless they are directly a part of this conversation
>> and we have some control over the solution they provide.
>>
>> There are many approaches to solving these issues (why not ansible or
>> puppet instead of chef), and the needs of NumFOCUS community are specific
>> enough that it deserves it's own solution.
>
>
> Ansible does not work on windows, and puppet lacks many basic features (I
> actually don't like chef very much, but that's the one which works the best
> today for windows).
>
> What is specific to NumFOCUS that is not solved by travis-ci ? Most scipy
> projects already use travis-ci effectively (numpy, scipy, sklearn and pandas
> do at least), but it lacks os x and windows support. OS X is a function of
> money I suspect (since that's supported by travis-ci already), and travis-ci
> has mentioned on ML they welcome help to make windows happen.

Travis in many ways views Python as second class (over more popular
langauages (for them) like Ruby). As a simple example, if you want to
test a package that requires matplotlib, scipy, etc., you have to
install from source, because they don't have binary packages in their
repos for all the Python versions

That's exactly what wheels are for.
 
(another, better option is to just
skip their stuff and use miniconda). Also consider this issue:
https://github.com/travis-ci/travis-ci/issues/1106. Travis CI *still*
hasn't updated PyPy from 1.9, even though 2.0 was released in May and
indeed 2.1 has been out since August. For SymPy, we have had to leave
PyPy in the expected fail part of the build matrix since several bugs
in PyPy 1.9 cause the tests there to always fail.

I doubt pypy is a significant blocker for the scipy community...

David 

Aaron Meurer

unread,
Nov 11, 2013, 8:16:36 PM11/11/13
to numf...@googlegroups.com, Christoph Gohlke
Aren't wheels Windows only?

>
>>
>> (another, better option is to just
>> skip their stuff and use miniconda). Also consider this issue:
>> https://github.com/travis-ci/travis-ci/issues/1106. Travis CI *still*
>> hasn't updated PyPy from 1.9, even though 2.0 was released in May and
>> indeed 2.1 has been out since August. For SymPy, we have had to leave
>> PyPy in the expected fail part of the build matrix since several bugs
>> in PyPy 1.9 cause the tests there to always fail.
>
>
> I doubt pypy is a significant blocker for the scipy community...

PyPy is just an example. As I mentioned, it was the same way with
Python 3.3 too. My point is that the Travis community doesn't have a
vested interest in Python (much less scientific Python). I'm not
convinced that they would fully understand the needs of the scientific
Python community. I'd be happy to be proven wrong, though, because as
you've noted, they've already solved many hard problems.

Aaron Meurer

David Cournapeau

unread,
Nov 11, 2013, 8:33:56 PM11/11/13
to numf...@googlegroups.com, Christoph Gohlke
No. The only windows-specific thing about it is that pypi only accepts wheels on windows (for now, being actively discussed on the distutils-sig as we speak).

To build a wheel, you just do pip install wheel and then python setup.py bdist_wheel, works everywhere.
 

>
>>
>> (another, better option is to just
>> skip their stuff and use miniconda). Also consider this issue:
>> https://github.com/travis-ci/travis-ci/issues/1106. Travis CI *still*
>> hasn't updated PyPy from 1.9, even though 2.0 was released in May and
>> indeed 2.1 has been out since August. For SymPy, we have had to leave
>> PyPy in the expected fail part of the build matrix since several bugs
>> in PyPy 1.9 cause the tests there to always fail.
>
>
> I doubt pypy is a significant blocker for the scipy community...

PyPy is just an example. As I mentioned, it was the same way with
Python 3.3 too. My point is that the Travis community doesn't have a
vested interest in Python (much less scientific Python). I'm not
convinced that they would fully understand the needs of the scientific
Python community. I'd be happy to be proven wrong, though, because as
you've noted, they've already solved many hard problems.

python 3.3 was released 5 weeks ago, and they have it now. That's pretty good.

We need to think about the counterfactual: how long will it take to get an infrastructure that works reliably, and can be used by the community as a whole [1] ? Can't this time be spent on working on code instead of admin stuff ?

David

[1] back of the envelope calculation: let's consider only numpy, scipy, sklearn, pandas and skimage only, 5 core packages. Each of them have 3 OS x 4 python implementations x 2-3 configurations, each of them taking of the order of 10 minutes, assuming all the dependencies installed (give or take): that's ~ 1500 minutes for one build, and each of this project has multiple builds / day, so it will need around 10 VMs running constantly, with all the corresponding issues in term of monitoring, restarting, admin, access ... 

Aaron Meurer

unread,
Nov 11, 2013, 8:40:40 PM11/11/13
to numf...@googlegroups.com, Christoph Gohlke
So I'm confused. Is this service intended to be used as a continuous
build service for the git versions of the projects, or only for
building binaries of releases? Does the community have a need for
binary builds on Windows for the latest git HEAD of {numpy, scipy,
sklearn, etc.}?

Aaron Meurer

David Cournapeau

unread,
Nov 11, 2013, 9:23:43 PM11/11/13
to numf...@googlegroups.com, Christoph Gohlke
Taking my former hat of numpy release manager, building the binaries for the release is not the most time consuming part. Testing is, especially on windows where few people track master. I suspect that's the main bottleneck for most projects we are talking about.

David

Ralf Gommers

unread,
Nov 12, 2013, 1:44:17 AM11/12/13
to numf...@googlegroups.com, Christoph Gohlke
Don't think so. My impression is Travis O proposes a build service intended to improve part of the release process only. So building every ~3 months instead of every ~6 hours.
 
Taking my former hat of numpy release manager, building the binaries for the release is not the most time consuming part. Testing is, especially on windows where few people track master. I suspect that's the main bottleneck for most projects we are talking about.

I agree. A build service for binaries would be helpful and make releasing easier, but reliable Windows CI with MinGW and MSVC+MKL would be a more significant improvement at this point.

Ralf

Jacob Barhak

unread,
Nov 12, 2013, 3:06:14 AM11/12/13
to numf...@googlegroups.com, Christoph Gohlke
Hi David,
 
Thanks for qualifying the amount of computing power needed and for identifying bottlenecks. Automation using multiple VMs is the solution. 
 
Travis is right, the infrastructure and tools are available - the technology has matured to solve this problem and perhaps a few others at the same time.
 
      Jacob


Wes McKinney

unread,
Nov 12, 2013, 10:37:17 AM11/12/13
to numf...@googlegroups.com, Christoph Gohlke
Uh, what? Having nightly Windows builds for a wide array of projects
is the whole point of this discussion.

>>
>> Taking my former hat of numpy release manager, building the binaries for
>> the release is not the most time consuming part. Testing is, especially on
>> windows where few people track master. I suspect that's the main bottleneck
>> for most projects we are talking about.
>
>
> I agree. A build service for binaries would be helpful and make releasing
> easier, but reliable Windows CI with MinGW and MSVC+MKL would be a more
> significant improvement at this point.
>
> Ralf
>

Ralf Gommers

unread,
Nov 12, 2013, 5:28:20 PM11/12/13
to numf...@googlegroups.com
I don't get that from Travis' email, but assuming you can read his mind better than I can my response would be: it's the wrong goal. You'd have to build and maintain all the infrastructure for good Windows CI without having Windows CI.

Ralf

Travis Oliphant

unread,
Nov 12, 2013, 7:18:14 PM11/12/13
to numf...@googlegroups.com
Getting to having nightly builds of Windows for a wide array of projects is a great goal, but to do it, you first have to have the ability to build on a weekly, or monthly, basis which is a good start. 

It will take more resources to do nightly as David pointed out.  It is still possible to do that, but will require more time and money --- but this is time and money that several cloud providers could be interested in providing us at this point.   Provisioning Windows machines on Azure with the needed software is pretty easy.

To David's question about conda, conda is an important tool in this discussion because of the "conda build"  command which automates the creation of the build environment and construction of the "thing" that is released.   It greatly simplifies the provisioning problem.    Coupling conda build with an Azure provisioning tool does solve the problem and is easier to do than I believe he realizes.  

Like I said, we do have the technical know how and resources to do this if we just go out and get some money and/or time from cloud vendors and others.  

Right now, we need a project champion though.   We are still looking for that person to step up who wants to lead this.

-Travis


Josef Pktd

unread,
Nov 12, 2013, 9:23:14 PM11/12/13
to numf...@googlegroups.com, Christoph Gohlke


On Monday, November 11, 2013 7:57:42 PM UTC-5, Aaron Meurer wrote:
On Mon, Nov 11, 2013 at 5:48 PM, David Cournapeau <cour...@gmail.com> wrote:
>
>
>
> What is specific to NumFOCUS that is not solved by travis-ci ? Most scipy
> projects already use travis-ci effectively (numpy, scipy, sklearn and pandas
> do at least), but it lacks os x and windows support. OS X is a function of
> money I suspect (since that's supported by travis-ci already), and travis-ci
> has mentioned on ML they welcome help to make windows happen.

Travis in many ways views Python as second class (over more popular
langauages (for them) like Ruby). As a simple example, if you want to
test a package that requires matplotlib, scipy, etc., you have to
install from source, because they don't have binary packages in their
repos for all the Python versions (another, better option is to just
skip their stuff and use miniconda). 

There is no problem to install binary packages from a remote repo on TravisCI.
statsmodels is using the NeuroDebian repo to install binary packages that are not directly available.
Installing binaries is much faster than building the dependencies.

TravisCI also has nice integration with coveralls.io
(Thanks to Yaroslav which got the ball rolling for statsmodels and scipy.)

statsmodels is tested continuously on TravisCI Ubuntus on nipy's Debian machines and on python-xy Ubuntu machines.
python-xy also builds nightly packages for distribution on Ubuntu.

Skipper is building nightly binaries for Windows in a box in the corner, but without running and giving feedback on unit tests.
We only find out about problems with Apple computers after a release.

Big advantage of TravisCI is the instantaneous feedback with unit test results on github.
Missing, binary packages are not available for download.

Extending the continuous testing to Windows would be a big help.
Second issue is to actually provide Windows binaries, which would be good if we didn't have to rely on Skipper's time and Windows skills.

Josef


 

 

Aaron Meurer

unread,
Nov 12, 2013, 9:29:18 PM11/12/13
to numf...@googlegroups.com, Christoph Gohlke
On Tue, Nov 12, 2013 at 7:23 PM, Josef Pktd <josef...@gmail.com> wrote:
>
>
> On Monday, November 11, 2013 7:57:42 PM UTC-5, Aaron Meurer wrote:
>>
>> On Mon, Nov 11, 2013 at 5:48 PM, David Cournapeau <cour...@gmail.com>
>> wrote:
>> >
>> >
>> >
>> > What is specific to NumFOCUS that is not solved by travis-ci ? Most
>> > scipy
>> > projects already use travis-ci effectively (numpy, scipy, sklearn and
>> > pandas
>> > do at least), but it lacks os x and windows support. OS X is a function
>> > of
>> > money I suspect (since that's supported by travis-ci already), and
>> > travis-ci
>> > has mentioned on ML they welcome help to make windows happen.
>>
>> Travis in many ways views Python as second class (over more popular
>> langauages (for them) like Ruby). As a simple example, if you want to
>> test a package that requires matplotlib, scipy, etc., you have to
>> install from source, because they don't have binary packages in their
>> repos for all the Python versions (another, better option is to just
>> skip their stuff and use miniconda).
>
>
> There is no problem to install binary packages from a remote repo on
> TravisCI.
> statsmodels is using the NeuroDebian repo to install binary packages that
> are not directly available.
> Installing binaries is much faster than building the dependencies.

Yes, but you have to actually *have* those binaries available. This
brings us full-circle. conda build is making this easier (I'm
currently working on using conda to install the optional dependencies
for SymPy). conda build + binstar makes it quite easy to do this,
except for the small detail that you have to actually have a Linux,
Windows, etc. machine to build the binary on in the first place.

And by the way, the less popular your dependency is, the less likely
you are going to be able to find some Debian repo that has it.

Aaron Meurer

>
> TravisCI also has nice integration with coveralls.io
> (Thanks to Yaroslav which got the ball rolling for statsmodels and scipy.)
>
> statsmodels is tested continuously on TravisCI Ubuntus on nipy's Debian
> machines and on python-xy Ubuntu machines.
> python-xy also builds nightly packages for distribution on Ubuntu.
>
> Skipper is building nightly binaries for Windows in a box in the corner, but
> without running and giving feedback on unit tests.
> We only find out about problems with Apple computers after a release.
>
> Big advantage of TravisCI is the instantaneous feedback with unit test
> results on github.
> Missing, binary packages are not available for download.
>
> Extending the continuous testing to Windows would be a big help.
> Second issue is to actually provide Windows binaries, which would be good if
> we didn't have to rely on Skipper's time and Windows skills.
>
> Josef
>
>
>
>
>
>

Josef Pktd

unread,
Nov 12, 2013, 11:29:38 PM11/12/13
to numf...@googlegroups.com
I wouldn't know how to build anything on Linux.
And I'd rather work with our Debian/Ubuntu packagers, i.e. Yaroslav and Tim directly.

Yaroslav and NeuroDebian are **very** helpful both for releasing and testing. And I found all our dependencies there.

What is currently missing as a systematic infrastructure is testing betas or release candidates of the core packages like numpy and scipy or pandas.
Running the test suite of the main packages like the scikits on a new beta or rc of numpy or scipy relies currently on individuals especially Christoph Gohlke on Windows, and Ralf for statsmodels. 

One current build problem for releases for Windows are that the binaries of statsmodels and pandas are compiled against a recent release of numpy, and so are not binary compatible with older releases of numpy which are still officially supported.

 
And by the way, the less popular your dependency is, the less likely
you are going to be able to find some Debian repo that has it.

As by the way, for statsmodels we are very picky for increasing (required) dependencies, and if we do then Debian and Ubuntu pick them up.

In terms of distributions, If there were no Gohlke binaries the situation for some packages would be pretty bad. For some packages like cvxopt that I recently installed on windows, his were the only binaries available for the python or numpy version that I needed.


---
The way I see it we would need three different things.

- continuous testing on Windows (and Apple) which would integrate in the best way with TravisCI which the core packages are already using but is Ubuntu only.
- nightly or weekly binaries for Windows: for example I currently cannot compile scipy and cannot run scipy master until the binaries for a beta or rc are available, and I cannot test if recent changes "screw" up my packages
- binary releases: this is currently mostly available with Enthought, Continuum, python-xy and Gohlke, although not all version combinations, but will meet the requirements of most users.

Josef

David Cournapeau

unread,
Nov 13, 2013, 8:53:29 AM11/13/13
to numf...@googlegroups.com, Christoph Gohlke
Having the binaries once you can build your package is trivial (it is literally one comand:  python setup.py bdist_wheel).

This
brings us full-circle. conda build is making this easier (I'm
currently working on using conda to install the optional dependencies
for SymPy). conda build + binstar makes it quite easy to do this,
except for the small detail that you have to actually have a Linux,
Windows, etc. machine to build the binary on in the first place.

Conda (or any other packager) is mostly orthogonal to the issue: conda build uses a recipe to build a conda package. Since each package already knows how to build itself (hopefully...), and making a package is one command. 

If the goal is helping the community for windows (and os x to a lesser degree), the hard part is buying, provisioning and maintening the build environments. If you can provide that, you get 95 % of the work. Conda (or other packager), is the remaining 5 %.

David

Aaron Meurer

unread,
Nov 13, 2013, 1:36:53 PM11/13/13
to numf...@googlegroups.com, Christoph Gohlke
Not all things that this community needs to have built are Python
libraries. There are a ton of C/Fortran/etc. libraries that are
important to the scientific community. I agree that Python libraries
tend to be relatively easy to compile.

>
>> This
>> brings us full-circle. conda build is making this easier (I'm
>> currently working on using conda to install the optional dependencies
>> for SymPy). conda build + binstar makes it quite easy to do this,
>> except for the small detail that you have to actually have a Linux,
>> Windows, etc. machine to build the binary on in the first place.
>
>
> Conda (or any other packager) is mostly orthogonal to the issue: conda build
> uses a recipe to build a conda package. Since each package already knows how
> to build itself (hopefully...), and making a package is one command.

Since when is building a complex C library one simple command? Maybe
it's one command for Python libraries (and you pray that it just
works). Here's the "one command" to build QT on linux:
https://github.com/ContinuumIO/conda-recipes/blob/master/qt/build.sh#L5.
That's only after applying a patch to the source code, and ensuring
that about a dozen libraries are installed (about half of which it
will gladly build a half-working binary without). Each of those flags
had to be figured out as something that has to be done to make it
work. I can't imagine the situation will be any simpler on Windows.

The whole point of conda build is *to make* building one command. But
it definitely isn't without it.

>
> If the goal is helping the community for windows (and os x to a lesser
> degree), the hard part is buying, provisioning and maintening the build
> environments. If you can provide that, you get 95 % of the work. Conda (or
> other packager), is the remaining 5 %.

The package manager is essential. You need to be able to build the
dependencies, and then reuse the binaries of those dependencies.

Also, conda is more than just a package manager + build tool. As has
been mentioned already, it handles the virtual environment creation as
well. Much of the provisioning and maintaining is already handled by
conda, especially for a first iteration of this build service.

Aaron Meurer

David Cournapeau

unread,
Nov 13, 2013, 2:13:49 PM11/13/13
to numf...@googlegroups.com, Christoph Gohlke
All the packages we are talking about here only have one non python hard dependency: blas/lapack.
 

>
> If the goal is helping the community for windows (and os x to a lesser
> degree), the hard part is buying, provisioning and maintening the build
> environments. If you can provide that, you get 95 % of the work. Conda (or
> other packager), is the remaining 5 %.

The package manager is essential. You need to be able to build the
dependencies, and then reuse the binaries of those dependencies.

Let me rephrase differently: if travis-ci were working on windows, and the last step of .travis.yaml was pushing wheels on some external repo available to 3rd party packages, you would already solve a huge burden. I know for sure it would be true for numpy and scipy, and the latter encompasses pretty much everything that's hard for the scientific toolstack.

If package manager were the issue, why is the issue we are talking about mostly solved on linux already ?

David 

Jacob Barhak

unread,
Nov 13, 2013, 4:04:33 PM11/13/13
to numf...@googlegroups.com, Christoph Gohlke
Hi David, Hi Aaron,
 
Aaron is right in the observation that there are many external dependencies to python. David, some necessary libraries are hard to build on multiple environments. If you limit yourself to a subset of packages and environments then the job is simple. Yet the more you extend your reach to more users, the harder the issue becomes. And some packages need improvements and a standardized building procedure may help improve their quality as well.
 
And David, you are right that the infrastructure to allow the build is a major effort - yet there is still much work beyond the infrastructure, The larger the variety of packages you wish to support the more sophisticated tools you will need. In time these tools will be considered as part of the infrastructure, yet since the job has not finished yet, it is too soon to predict percentages of work - there are still many small unknowns not resolved - much of the work will be resolving many small issues.
 
In any case, it seems the discussion is about minor differences is implementation path - I understand that there is general agreement that the goal is worthy of effort.
 
                 Jacob

Anthony Scopatz

unread,
Nov 14, 2013, 3:16:23 AM11/14/13
to numf...@googlegroups.com, PyNE Dev, Christoph Gohlke
Hello All, 

I'd like to volunteer to be the person to spearhead this effort, or at least be the manager/organizer of this effort. Normally I wouldn't do this (I have enough on my plate) but the build and distribution problems have become the central issue for PyNE in the past 10 days [1]. On Monday Nov. 4th, Katy hosted a PyNE workshop at UC Berkeley.  Out of the ~15 attendees, 2 could successfully install PyNE...even using anaconda and/or canopy and ignoring our optional dependencies. We absolutely have to solve this problem for the project to survive. The PyNE stack fairly standard for this community.  The fact that we have had such terrible install problems is really a serious issue.  We are going to dedicate the next release to fixing this problem rather than doing any nuclear engineering.

Furthermore, I don't think that this is a wholly unique situation as I know other codes, such as yt, have experienced similar difficulties in this space.

I believe that to fix this problem sufficiently we must have empathy for our users, rather than for ourselves as developers.  We need to be open to installing packages in the way the user is comfortable with even though it may make more work for us.  

Many of the problems I feel stem from wanting to support a public API for many languages simultaneously (C/C++/Python/Fortran).  PyNE is not unique in this way.  NumPy also does this.  However, the dependency pool for PyNE is much broader and deeper than for NumPy.  Additionally as Aaron brings up, we aren't really just talking about supporting Python & Cython code.  By virtue of our dependencies, we also have to support packages in these compiled libraries.

I have many ideas about how to go about this process, though it will require help from many people.  I believe all of the pieces are there to do this right, but what we need is a coordinated strategy, someone to carve up tasks appropriately, and someone to make sure that this process doesn't take too long.  Again, I am happy to be the cat-wrangler, if for no other reason than that PyNE will be going through this shortly.

Travis-ci, vms, etc, have come up quite a bit so far on this thread.  As others have noted, this is the right idea but insufficiently executed.  Therefore, I'd like to direct folks to the Build & Test Lab (BaTLab). This is an NSF funded organization whose purpose it is to provide an integrated suite of platforms for building and testing scientific software.  Even though it is at UW-Madison, it is open to everyone and free to use.  The advantage here is that BaTLab jobs can take as long as they need to and it already supports the platforms we need in many flavors: Windows, Mac, Linux.  For another project (cyclus), I am working on a little web app which exposes Travis-CI-like functionality but is backended by BaTLab.  Hopefully this will be in a working state by this weekend.  Basically, there is no need for us to build the infrastructure! Or at lease no need to do so in the short- to medium-term.

I think if every project were able to donate a person or two to work on this for the next month we really could get something working that truly addressed or routed around some of the fundamental flaws with the current system.

In summary, we (PyNE +/- yt) can run off and solve this for ourselves -OR- we (NumFOCUS) can help pool our resources and try to create a quality solution that applies to everyone.  Sorry if this is a bit ranty, but it has become difficult for me to sell Python & PyNE because of building and distribution.  This is long overdue and I am willing to put my money where my mouth is.  If someone has better ideas, I'd love to hear them!

Be Well
Anthony

Till Stensitzki

unread,
Nov 14, 2013, 4:18:18 PM11/14/13
to numf...@googlegroups.com, PyNE Dev, Christoph Gohlke
Hi,
a windows user here. First i want to thank C. Gohlke, without him a couldn't use a lot of python packages.
I am probably not the only one which is using WinPython + Gohlkes packages.

One think i want to emphasize:  From time to time i tried to build numpy with mingw-64, the last time was the first
time i got it to compile (but crashed during the tests). This includes a lot of fixing old known bug in distutils
and commenting outs part numpy-distutils. So one think i would love as a by product of the discussion
here is some kind of well defined environment, where at least numpy and scipy is easy to compile.
Because the homogeneity of Windows, this should be doable. I also would prefer if it would not include msvc,
but i don't know of this is possible.

thanks,
Till

David Cournapeau

unread,
Nov 14, 2013, 4:47:51 PM11/14/13
to numf...@googlegroups.com, PyNE Dev, Christoph Gohlke
We have been working on and off with @ogrisel from scikits learn on exactly this during euroscipy / pycon.fr sprints. I will try to put what we have on GH this WE

David 
thanks,
Till

Travis Oliphant

unread,
Nov 14, 2013, 9:21:36 PM11/14/13
to numf...@googlegroups.com
Hey Anthony, 

BaTLab looks cool.  As a side note?  Do you think it would be open to receiving some machines from snakebite.org which has some interesting versions of Unix that is currently testing Python on.   

Having these machines available is a big step.   The next step is a system that manages build jobs on those machines which will allow rapid debugging of build issues that arise.   With this, conda build is available as a command to finish the job.

I strongly suggest making a conda package for PyNE.   The other dependencies you list are already available in public repositories as conda packages for multiple machines (even cmake).   I think you would find your installation headaches eliminated.   There is no reason to solve the PyNE installation problem differently.  

It is quite easy to make a conda package.    See my talk for a simple summary (starting on slide 27 for build...)


By the way, we are very close to a pip-installable conda package so that you can take *any* python distribution and start using it to install conda environments and packages.   This works on Mac and Linux now, but are tracking down a few Windows issues still.  

-Travis



Wes McKinney

unread,
Nov 24, 2013, 10:10:34 PM11/24/13
to numf...@googlegroups.com
What are next steps here? I can start asking people for funding for
the project but what should I tell them?

thanks,
Wes

Travis Oliphant

unread,
Nov 25, 2013, 12:07:05 AM11/25/13
to numf...@googlegroups.com
Anthony said he could potentially push this forward.   Is this still true? 

What is needed is the donation of co-located hardware.   I think asking Microsoft for some Azure time would be the the best approach, but other cloud or co-location providers could also step in here.

-Travis

Anthony Scopatz

unread,
Nov 25, 2013, 9:46:25 AM11/25/13
to numf...@googlegroups.com, PyNE Dev, Aron Ahmadia
Hello Travis, Everyone, 

Sorry for taking so long to get back to everyone on this.  I have been busy both traveling and clearing stuff of off my plate to prepare for this task (see the PyNE mailing list).  I am definitely still planning on spearheading this effort, mostly out of my selfish needs for PyNE, yt, PyTables, etc.  

Luckily, the travel time has given me ample opportunities for reflection for this topic.  I think that we have two problems that often get conflated into a single solution:

1. Reproduciblely building software
2. Reproduciblely installing software 

These can roughly be thought of as "the user problem" and "the developer problem".  I have yet to see a tool or suite of tools which adequately treats both of these problems in a first class way*.  We need to make it easy for both developer to build their code on a variety of platforms and then distribute these builds through a variety of mechanisms (package managers).  

What we are, in general, missing is the code that glues build managers to package managers.  Up until now, I believe there has been some debate about whose responsibility this abstraction is.  It doesn't fit cleanly into either the notion of build manager or the notion of package manager. Because of this it has ended up being no one's responsibility and therefore - where the problem is difficult in scientific computing - we end up with a poor user experience.

What we need is one of two things at minimum:

1. one cross-platform build manager which is willing to generate packages for many package manager
2. one package manger which is willing to act as a build manager and benevolent enough to generate packages for many other package managers

The reason many package managers must be supported is that it is the right thing to do by the users.  This allows users to continue to use their favorite package manager.  Or, failing that, the user may try other package managers until they find one which works.  This is a more robust system than the one builder, one packager setup because users have many mechanisms for getting the code and its dependencies.  

The list of platforms we need to target is fairly easy: Linux, Mac OSX, and Windows.

The list of package managers that we need to target at a minimum are as follows: apt, conda, macports, homebrew, pip/easy_install.  Others should follow fairly easily from these.  I dream of a world where every scientific computing package (not just the Python ones) will be installable via all of these mechanisms.  This suite is chosen to support the major Linux package manager, the two major Mac OSX ones, and two cross-platform package managers (pip & conda) so that we have coverage of Windows.  I'll leave the relative merits of pip vs conda for someone else to detail, but between them we have good coverage of user-space & system-wide installations.

The two remaining choices are "What build manager do we choose?" and "How do we make it easy to automate building on multiple different platforms?"

How we automate can be done in a number of ways that are well understood.  I think that BaTLab is nice because it is free and managed.  This would get us a nice first cut.  If we end up needing to move to some other larger system which we manage ourselves, we can do that at a later date.  Note, I have been working on this topic in the past couple of weeks, even though I withheld sending this email.  In this time, I built a GitHub-to-BaTLab web service similar to Travis-CI called Polyphemus (http://polyphemus.org/).  It is still a bit experimental and we are working out the kinks for Cyclus right now. After these are hammered out, we'll do a release.  This should be easily enough extendable to bitbucket as well, if there are any brave souls.

What build manager do we choose?

This remains the big question to me. There are three primary options in my mind: cmake, hashdist, and conda.  I am willing to consider other options but I am not willing to pick more than one to start with. The fourth other option, start a new project to perform the task of gluing build managers to package managers, is currently untenable since we haven't fully explored existing options.

CMake is a nice option because it builds everywhere and can produce stand-alone packages for the various platfoms.  However, it kind of sucks at knowing how to install Python code.  It is also not clear how much success we would have pushing the changes required upstream and having them maintained.  I don't think that this would be impossible, but it might be harder than desired.

Conda is an option, even though it is nominally a package manager, because (so I hear) it has a mechanism for interacting with PyPi.  Travis, do you think that conda as a builder could be extended to support the suite of other package managers mentioned?  If so, what support would Continuum provide?  Namely would it be limited to accepting PRs or would there developer time be put on it?  

Also, to answer your question Travis, I personally have had a lot of trouble installing PyNE on top of conda in the past. I have tried many times over the past couple of years. I have typically been able to get it working eventually, though it often involves installing my own version of hdf5 and then doing custom path edits to make the PyNE see my version of HDF5 rather than conda's.  Obviously this not a solution that is ready for users.  Because it wouldn't have worked with conda's hdf5, I didn't try building a PyNE conda package. 

I'd still like to consider conda as a software builder, but I need the conda team's guidance on how feasible this is.  My understanding is that yt has had some success using conda in this way, but it involves creating a parallel universe of packages - duplicating some like HDF5.  I think that this OK in general.

Hashdist comes up here because Aron and I had a call about it right before SC13.  It was during this call that the distinction between the user's needs and the developer's needs really came out.  I my previous discussions with Dag about the role of hashdist it became clear to me that it really was not a package manager or a user tool.  It also wasn't clear to me at the start of talking to Aron how to go about connecting hashdist the various package managers and (he can correct me if I am wrong) I don't think it was clear to Aron that front ending to package managers was a valuable thing to do. However, by the end of that conversation we had come to the agreement that - barring anything else - if Aron could get a couple of front ends working (say brew and cygwin since he is familiar with those) in the next month or so, then we here would have a pattern to build on for the other package managers. 

The Proposal / Next Steps:

I think that we should wait for Aron to create proof-of-concept front ends for certain package managers in hashdist.  Meanwhile, I'll be worried about creating a service for submitting builds to BaTLab on the platforms mentioned.  I'll also worry about drumming up financial support, support from the various projects, and investigating what it will take to interface with the various other package managers.

After Aron has these proof of concepts, we'll have to extend hashdist to support the other package mangers.  

Finally, we'll need to finish up the build service.  

Timeline: 3 months.

Other Thoughts:

Whatever is built as a result of this effort needs to make life noticeably better.

Travis, I am still deeply interested in using conda as a build manager. If you strongly feel that this would be a more successful choice or less work than hashdist I'd love to hear the reasons.  Note that whichever build manager choice we go with, I'll still be targeting conda as a package manager since it is open source and supports user-space on Windows. 

Sorry all for the huge brain dump here.  This is a couple of weeks of me thinking seriously about what the needs and wants are and how we can tackle them efficiently.  

Again, I am giving up PyNE development time to work on this.  The other PyNE developers are very supportive of this because they feel the same pains I do.  They have agreed to pick up some of my slack in this time period.  In light of that, I am personally interested in getting to a minimally viable and sustainable solution as quickly as possible.  

Be Well
Anthony

* Except for source-based Linux distributions. I'll get off of my Gentoo/Arch high horse now ;).

Matthew McCormick

unread,
Nov 25, 2013, 11:42:33 AM11/25/13
to numf...@googlegroups.com, PyNE Dev, Aron Ahmadia, Bill Hoffman, Brad King, Jean-Christophe Fillion-Robin
Hi Anthony,
Overall, I think this an excellent assessment of the situation.

>
> What build manager do we choose?
>
> This remains the big question to me. There are three primary options in my
> mind: cmake, hashdist, and conda. I am willing to consider other options
> but I am not willing to pick more than one to start with. The fourth other
> option, start a new project to perform the task of gluing build managers to
> package managers, is currently untenable since we haven't fully explored
> existing options.
>
> CMake is a nice option because it builds everywhere and can produce
> stand-alone packages for the various platfoms. However, it kind of sucks at
> knowing how to install Python code. It is also not clear how much success
> we would have pushing the changes required upstream and having them
> maintained. I don't think that this would be impossible, but it might be
> harder than desired.

First, I agree that CMake sucks at knowing how to install python
software - some other solution would be needed there. But, I think
that is definitely the best solution for issue one -- reproducibilely
building cross-platform software.

CMake is an open source project. Contributed changes are very
welcome. There is a large, very active community supporting the
project. Also, Kitware, is using and trying to support scientific
Python more and more. And, there is acknowledgement that the best
solution for everyone is a system that works in harmony with projects
like conda and hashdist and with all participants in groups like
NumFOCUS.

Thanks,
Matt

Frédéric Bastien

unread,
Nov 25, 2013, 1:00:06 PM11/25/13
to numf...@googlegroups.com
Hi,

It is great to see this problem being worked on. I have just one thing
to add, can people try to keep the system open, in the sence that
other project could add their own platform? For example, there is no
CI or build service that support GPU to my knowledge and we need this.
If one system would allow us to add one of our computer to it, it
would enable us to test on the GPU in that system.

The GPU is just an example, it will happen for each "special"
hardware. Allowing user to contribute platform would be great.

Frédéric

Travis Oliphant

unread,
Nov 25, 2013, 1:29:28 PM11/25/13
to numf...@googlegroups.com
On Mon, Nov 25, 2013 at 8:46 AM, Anthony Scopatz <sco...@gmail.com> wrote:
Hello Travis, Everyone, 

Sorry for taking so long to get back to everyone on this.  I have been busy both traveling and clearing stuff of off my plate to prepare for this task (see the PyNE mailing list).  I am definitely still planning on spearheading this effort, mostly out of my selfish needs for PyNE, yt, PyTables, etc.  

Luckily, the travel time has given me ample opportunities for reflection for this topic.  I think that we have two problems that often get conflated into a single solution:

1. Reproduciblely building software
2. Reproduciblely installing software 

Yes, these are different issues, though they are related as you describe.    Installing the software has been solved with packaging solutions (typically) and these packaging solutions rely on the software authors to at east have a reference build approach.   Then, the majority of work happens with all the myriad of configuration details that can accompany build under various conditions. 
 

These can roughly be thought of as "the user problem" and "the developer problem".  I have yet to see a tool or suite of tools which adequately treats both of these problems in a first class way*.  We need to make it easy for both developer to build their code on a variety of platforms and then distribute these builds through a variety of mechanisms (package managers).  

There are a actually a lot of tools that treat these problems in a first class way.  The challenge is the accessibility and modifiability of these tools for the particular challenges being faced --- as well as a lack of knowledge of the tools.  I know that I don't know everything about CMake for example --- even though I've seen a lot of people use it quite successfully to build. 

What we are, in general, missing is the code that glues build managers to package managers.  Up until now, I believe there has been some debate about whose responsibility this abstraction is.  It doesn't fit cleanly into either the notion of build manager or the notion of package manager. Because of this it has ended up being no one's responsibility and therefore - where the problem is difficult in scientific computing - we end up with a poor user experience.

This is quite true.  It's a reason why conda build exists.   conda build does not care how you build the package (you could easily use CMake, waf, scons, make, setup.py, hashdist, whatever) with conda build.    Conda is more about creating meta-data around whatever choice you make and then providing a mechanism to install that into an isolated environment. 
 

What we need is one of two things at minimum:

1. one cross-platform build manager which is willing to generate packages for many package manager
2. one package manger which is willing to act as a build manager and benevolent enough to generate packages for many other package managers

The reason many package managers must be supported is that it is the right thing to do by the users.  This allows users to continue to use their favorite package manager.  Or, failing that, the user may try other package managers until they find one which works.  This is a more robust system than the one builder, one packager setup because users have many mechanisms for getting the code and its dependencies.  

A cross-platform "build manager" is a tall order and potentially un-necessary --- as there are many existing build managers.   Perhaps hash-dist could play a role here. 

For #2, conda is the closest thing to what you are looking for.    It would be a waste of resources to try to re-create all the things conda is already doing for you.   It should be quite straightforward to build other kinds of packages from conda packages and conda recipes if that is ultimately needed.   It has just not been a priority.   For the purposes of assisting Windows users, I don't think it's really necessary, either --- but potentially very distracting to the actual goal. 

There are many packages and kinds, but they are all not equal and should not be considered as if they are "all equal".  
 

The list of platforms we need to target is fairly easy: Linux, Mac OSX, and Windows.

The list of package managers that we need to target at a minimum are as follows: apt, conda, macports, homebrew, pip/easy_install.  

I would disagree with this list of priorities or minimum set.   I think the minimum set is conda pkg and pip whl.   Then, after that is working, if you want to target additional binary packages, you could create packages for rpm, apt, brew, and macports.   Easy converters can exist (things like alien), but there are significant challenges in getting those to work well with the packages people have already installed into their "other system".  

Also, while these are listed together, the capabilities they provide are actually quite different.   It is hard to see why pip whl should even be included as it's not in the same category (it only installs Python packages).   The only reason it's listed here is because of the wide user-base of people who will benefit from having whl packages widely available.   Whl does not, however, handle the dependency chain for non-python packages that are necessary for most scientific codes --- so you will typically have a lot of static linking or dll-shipping with whl in ways that will mean that there will not be one "whl" to rule them all.  

 
Others should follow fairly easily from these.  I dream of a world where every scientific computing package (not just the Python ones) will be installable via all of these mechanisms.  This suite is chosen to support the major Linux package manager, the two major Mac OSX ones, and two cross-platform package managers (pip & conda) so that we have coverage of Windows.  I'll leave the relative merits of pip vs conda for someone else to detail, but between them we have good coverage of user-space & system-wide installations. 
 

The two remaining choices are "What build manager do we choose?" and "How do we make it easy to automate building on multiple different platforms?"

How we automate can be done in a number of ways that are well understood.  I think that BaTLab is nice because it is free and managed.  This would get us a nice first cut.  If we end up needing to move to some other larger system which we manage ourselves, we can do that at a later date.  Note, I have been working on this topic in the past couple of weeks, even though I withheld sending this email.  In this time, I built a GitHub-to-BaTLab web service similar to Travis-CI called Polyphemus (http://polyphemus.org/).  It is still a bit experimental and we are working out the kinks for Cyclus right now. After these are hammered out, we'll do a release.  This should be easily enough extendable to bitbucket as well, if there are any brave souls.

Using the resources of BaTLab sounds like a great help.   Ultimately that is the missing piece --- maintained machines that can be used to do the builds.   
 

What build manager do we choose?

This remains the big question to me. There are three primary options in my mind: cmake, hashdist, and conda.  I am willing to consider other options but I am not willing to pick more than one to start with. The fourth other option, start a new project to perform the task of gluing build managers to package managers, is currently untenable since we haven't fully explored existing options.

We should set up a call so we can discuss this in more depth --- the way you have cut the problem is not the only way to think about this and I fear it's creating more complexity than necessary for the task at hand.     We need to explain to you better what conda does, because I don't think you understand how it actually solves most of the problems you have outlined.

It is not about *either* cmake, conda, or hashdist.   Upstream projects will choose what mechanism they will use to build their package.  They will want to do the easiest thing.   We would all be better of if people were using cmake, waf, or other tools instead of distutils, so it is helpful to provide people with suggestions, but ultimately the project will decide how it gets built.   conda can make use of whatever the project decides to do.    It would be very helpful if projects would create conda recipe directories, and over time I think that can happen --- we already have a large collection of them available to look at and change.

Also hashdist is a small part of this problem.   The idea of hashing all the build configuration information (recipe and environment) is a nice one for people who like to build everything themselves (or who have to because they are on unusual hardware).   These hashes can easily fit as the build-string of a conda package so that you could use conda to manage binaries built with hash-dist profiles --- including all the dependencies and creation of environments based on specific profiles. 

Conda can work with *both* cmake and hashdist.   I don't think this is the major issue here.    Ilan and I would love to talk with you in person about conda.  There are definitely additional tools that are necessary and useful, but conda is quite far along in terms of solving the major issues. 
  

Also, to answer your question Travis, I personally have had a lot of trouble installing PyNE on top of conda in the past. I have tried many times over the past couple of years.

It would be great to hear specifically what problems you have had.   The conda build command has only been available for a few months, so I don't see how you could have really tried it.    Perhaps you had difficulty building from pre-built conda packages which we have made available in the past --- which can definitely happen.      

Nothing stops you from creating your own conda package hierarchy with binaries built exactly to the specifications you require.   I understand that this gets tedious, but it's also the fundamental problem you will face with any solution.    
 

I have typically been able to get it working eventually, though it often involves installing my own version of hdf5 and then doing custom path edits to make the PyNE see my version of HDF5 rather than conda's.  Obviously this not a solution that is ready for users.  Because it wouldn't have worked with conda's hdf5, I didn't try building a PyNE conda package. 

I don't recall seeing any messages from you about these difficulties, but I could have missed them.   It would be great to hear what trouble you had.   If we need to build hdf5 differently or better, then we should do that.   If you have a better build-recipe for hdf5, then let's use that for the "standard".    When was the last time you tried it and on what platform?  

There is really no such thing as "conda's hdf5".   Conda can build many different kinds of hdf5 packages.   Which hdf5 package you install depends on which "channel" or repository you are setup to look at. 

Now, there is Continuum's Anaconda channel which we make freely available within which we have a specific hdf5 package for each platform we have defined.   But it's entirely condeivable that the build options the Anaconda package selected is not appropriate or desired.  Most of the time these issues can be resolved with bug-reports and different build-flags, etc.    If there are truly different configurations that are needed (i.e. the idea of "platform" needs to be expanded), then this is exactly why conda build and binstar.org exist --- to allow different binary chains to exist. However, windows users are going to want a single install chain for their platform (32-bit or 64-bit).   The Anaconda channel provides that right now.   

In fact, I would argue that when people claim they want their own "packages", that is (mostly) a cognitive substitute for the fact that different build environments effectively create different "platforms" besides the traditional Linux, OS-X, and Windows "names".   In effect, your "platform" is whatever combination of system libraries, default environment variables, and dependencies that your "packager" has provided you.    Each new package you install is sitting on that "platform".  

The great challenge of most python installations is that basically everyone is on their own platform because of the way they have obtained / built the tree of dependencies they currently sit on.   Fortunately, many of these platforms are compatible with each other (especially in Python-only land).   However, once you add binary libraries obtained from other build processes, then the fun begins....

Please let us know what issues you have had with HDF5.   I suspect we can resolve these issues and then PyNE can be a conda package pretty easily.  

Hashdist comes up here because Aron and I had a call about it right before SC13.  It was during this call that the distinction between the user's needs and the developer's needs really came out.  I my previous discussions with Dag about the role of hashdist it became clear to me that it really was not a package manager or a user tool.  It also wasn't clear to me at the start of talking to Aron how to go about connecting hashdist the various package managers and (he can correct me if I am wrong) I don't think it was clear to Aron that front ending to package managers was a valuable thing to do. However, by the end of that conversation we had come to the agreement that - barring anything else - if Aron could get a couple of front ends working (say brew and cygwin since he is familiar with those) in the next month or so, then we here would have a pattern to build on for the other package managers. 

Hashdist is interesting for it's ability to potentially create individual "platforms" for people.   It is especially useful in the HPC space where people need to compile everything and have a hard-time relying on "other-people's builds" of the software they need because of all the configuration possibilities with things like MPI libraries and linear algebra tools.    It is also still very much a work-in progress.  

I think we can get a PyNE conda package built for your users in a matter of hours to days. 
 

The Proposal / Next Steps:

I think that we should wait for Aron to create proof-of-concept front ends for certain package managers in hashdist.  Meanwhile, I'll be worried about creating a service for submitting builds to BaTLab on the platforms mentioned.  I'll also worry about drumming up financial support, support from the various projects, and investigating what it will take to interface with the various other package managers.

I strongly disagree that this is the appropriate direction to take for the purposes of helping Windows users and developers.   I would suggest that we build a conda package for PyNE, and show you the solutions that already exist for managing the process right now.  

 
Other Thoughts:

Whatever is built as a result of this effort needs to make life noticeably better.

Travis, I am still deeply interested in using conda as a build manager. If you strongly feel that this would be a more successful choice or less work than hashdist I'd love to hear the reasons.  Note that whichever build manager choice we go with, I'll still be targeting conda as a package manager since it is open source and supports user-space on Windows. 


I definitely strongly feel this would be more successful and significantly less work than hashdist.   I've done this multiple times, and I see the direction you are going but have to warn you against it.   It's nice in theory, but it's much harder to pull off then you think --- and it's not particularly relevant for Windows users.    I don't think we need at this point to be working on a "build manager" for the purposes of helping Windows users.   Of course, if you and Aron want to work on improving hashdist, you can.  I just think that it's not the most effective use of resources right now for solving the problem of helping Windows users with their build issues.   

I see two things that might be useful: 

  * automatically creating .whl packages from conda packages (possibly this could be extended to other package managers --- but until those package managers get the notion of environments, they are considerably less useful than conda).  

  * working on improving conda recipes for various platforms.   We have been doing a lot of work to make this happen and have given almost all of this work away for the sole purpose of improving the community.   We could use help on building better packages and getting more testing and fixing bugs in conda. 

  * getting Windows boxes available for build and test --- if BaTLab has some that could be used, then great, let's do it.  Otherwise, let's get them available from whatever source we can.  

Right now, each project has an approach to build.   The conda recipes respect that and allow you to do whatever you need to do to get the build finished.     If I understand what you mean by "build manager".   Then, mostly what is needed is build resources (machines), a "queueing system" for managing what runs on those machines, a collection of conda build recipes (which involve the correct configuration of the machine and then a build script), and conda build.     

Time for this:  2 weeks

Let's definitely setup a call so I can understand better how you are thinking about the problem and see if there is anything I'm missing about what you need.    I do think you would benefit from a call with us.  Ilan and I have at this point been intimately involved in the creation of two widely used Python distributions and understand quite a bit about different ways to do this --- we have failed multiple times already trying to do different approaches and have a good idea of what works and what ultimately doesn't.   This is particularly true for Windows users.    There is no magic bullet for some of the problems which are a natural result of the combinatorial explosion of possible "platforms" that could be defined.  To benefit from the network effect of someone having a cached binary you can use, means finding good common "platforms".    conda/binstar provide a very useful framework within which to solve these issues. 

There are other useful tools to be sure (particularly suited to the use-case they were designed for).  Conda was designed to make it easy to reproduce environments for technical computing across a wide-range of platforms.   Building software is an example of "technical computing".  It just requires a particular environment.    There is absolutely no reason not to leverage the work we have done here.   I'm very familiar with hashdist and respect the work they are doing and encourage anyone to contribute to that project.    We have met with hashdist authors several times and I think I have a good understanding of how conda and hashdist overlap and tackle different problems. 

For helping projects with their Windows build problems, conda is much, much closer to a solution that can be useful as it already works for that purpose now --- it's just a matter of automation and fixing bugs that might arise.

Best regards,

-Travis

Matthew Turk

unread,
Nov 25, 2013, 1:50:27 PM11/25/13
to numf...@googlegroups.com
Hi Travis,

I had a number of thoughts about your thoughtful and comprehensive
reply to Anthony, but I wanted to just jump in with an explanation
about one item.
This is the most prominent of the issues we ran into with yt on Conda:

https://github.com/ContinuumIO/anaconda-issues/issues/18

I believe this is similar or identical to what Anthony is describing.
Ultimately this has largely scrapped our plans to use conda as the
primary deployment platform for the 2.x version of our codebase,
rather than removing the build-time HDF5 dependency (in favor of
run-time h5py dependency) in the 3.0 line of development. We are now
encouraging it for 3.0. I wrote to the anaconda-support mailing list
some time ago asking for help getting conda set up as a
"Developer-friendly" environment, but there were no replies (and I am
likely to blame here; I am sure that the question could have been much
better posed).

https://groups.google.com/a/continuum.io/d/msg/anaconda/N_NQcCcWFqo/3F0pU36bCP8J

Ultimately, though, I think that conda as a solution is a very good
idea, and one that I am *personally* dogfooding for the projects I
work on, but what we have run into is a problem getting individuals up
to speed on how to interoperate conda with other package systems, with
building their own items and developing simultaneously, and with
encouraging individuals to deploy it on systems like supercomputers
where quotas may be quite small and we may have very good reasons to
interoperate with the system builds. The specific pain points include
determining precisely what magic occurs during a "conda build"
statement (in terms of RPaths, etc), identifying where the "up to
date" documentation is (I believe it is a union of documentation here:
http://docs.continuum.io/conda/index.html and blog posts) and
automating processes.

-Matt

Travis Oliphant

unread,
Nov 25, 2013, 4:25:05 PM11/25/13
to numf...@googlegroups.com
Thanks for the feedback and example.  I think this actually illustrates my points very well about the misunderstandings that exist with conda.   Conda is a package manager that can integrate with a wide variety of build tools -- and therefore help with the build management.  

The problem is *conflating* conda, the tool, with our particular set of builds available in Anaconda.   You can use conda and never use any of our Anaconda binaries --- you can also use Anaconda binaries without really using conda as well. 

I agree that we could provide more guidance here about how we build Anaconda binaries themselves.  We are moving in that direction.  We are actually looking for build engineers who could help us do this. 

We are doing our best to create documentation around conda and how it can be used to easily create your own stack.    Anaconda itself must necessarily choose a particular set of system libraries (i.e. the platform) to use for it's build dependencies (XCode version, glibc, compiler versions, etc.).    Most of the (non-bug) issues I've seen --- and this seems no different --- are because we build with a particular set of dependencies and another  Mac OS 10.5 and it's possible that other build dependencies for another package that you are trying to build is not compatible with the run-time we have built a binary package against.   We see similar issues for the version of GLIBC.  

These are all eminently solvable *with* conda. It's just a matter of building a different package with a different compiler, different flags, or a different set of dependencies.    I would suggest that it doesn't make sense that to say that you are "blocked using conda" (except for documentation improvements and some information about any RPATH "magic"), but I can see that you are "blocked using our pre-compiled binaries" for any number of libraries and need different binaries.    One of two options is possible:  1) help us build "better" binaries that cover a wider variety of users needs -- perhaps it's time to move from OS 10.5 as a base or 2) build better binaries for your own needs and host them on binstar.   

In the end building useful packages for others is hard.  Conda gives you a way to store the meta-data about how to do that, create environments within which to try it out, and binstar provides a place to host those binary artefacts. 

The feedback is very much appreciated.   We welcome any input and help with improving conda.   

Best,

-Travis


Nathan Goldbaum

unread,
Nov 25, 2013, 4:31:20 PM11/25/13
to numf...@googlegroups.com
Hi Travis,

What I don't understand is why I am able to build yt under 'conda build' but not under 'setup.py install'.  In both cases I'm trying to link against Anaconda's libraries (in this case HDF5) but for some reason the former succeeds while the latter fails.

Can you explain what I can do to replicate the build environment under conda build?

Sorry for being a little off-topic, this is something that has puzzled me in the past.

Nathan

Aaron Meurer

unread,
Nov 25, 2013, 5:20:21 PM11/25/13
to numf...@googlegroups.com
One thing that I think is important to remember, which I think hasn't
really been mentioned in this thread yet, is that conda packages have
to be relocatable. conda build employs various tricks to achieve this,
see for instance
https://github.com/ContinuumIO/conda/blob/master/conda/builder/macho.py#L40
and https://github.com/ContinuumIO/conda/blob/736111e24e7d2b783d910c9b3cb514c8698e813d/conda/builder/post.py#L155.
I believe the primary source of these build failures come from the
failure of these tools to work correctly. If you want to know more
about the technical details of this, though, you'll have to ask Ilan.

Aaron Meurer

Matthew Turk

unread,
Nov 26, 2013, 8:17:44 AM11/26/13
to numf...@googlegroups.com
Hi Travis,

I'm going to reply to a few items you've raise as they relate to the
original topic of a community build service, but I also think at this
time I'm prepared to move any further discussion of the issues we've
specifically had with conda to the anaconda-discuss list.

On Mon, Nov 25, 2013 at 4:25 PM, Travis Oliphant <tra...@continuum.io> wrote:
> Thanks for the feedback and example. I think this actually illustrates my
> points very well about the misunderstandings that exist with conda. Conda
> is a package manager that can integrate with a wide variety of build tools
> -- and therefore help with the build management.
>
> The problem is *conflating* conda, the tool, with our particular set of
> builds available in Anaconda. You can use conda and never use any of our
> Anaconda binaries --- you can also use Anaconda binaries without really
> using conda as well.

Yes, I can see how this is the case. Conda is a build system, and in
fact, we went a long way down the road of providing an "alternate
universe" of packages. I'm happy to provide links to the extensive
mailing list discussions where this was brought up, or to provide
information about the methods by which we were building packages using
VMs and CI servers. But where it ended was: we do not have the
resources to provide this. And I think that's where what Anthony
raised is important -- regardless of the system by which packages are
distributed, built, managed, etc, if binary packages are to be
provided, someplace to build them has to be available.

I also don't think that we can easily discount the point he made,
which I am *extremely* concerned about, which is that the more work,
magic and effort required to support a transition between "user" and
"developer" of packages living in an ecosystem, the fewer transitions
that will occur.

>
> I agree that we could provide more guidance here about how we build Anaconda
> binaries themselves. We are moving in that direction. We are actually
> looking for build engineers who could help us do this.
>
> We are doing our best to create documentation around conda and how it can be
> used to easily create your own stack. Anaconda itself must necessarily
> choose a particular set of system libraries (i.e. the platform) to use for
> it's build dependencies (XCode version, glibc, compiler versions, etc.).
> Most of the (non-bug) issues I've seen --- and this seems no different ---
> are because we build with a particular set of dependencies and another Mac
> OS 10.5 and it's possible that other build dependencies for another package
> that you are trying to build is not compatible with the run-time we have
> built a binary package against. We see similar issues for the version of
> GLIBC.
>
> These are all eminently solvable *with* conda. It's just a matter of
> building a different package with a different compiler, different flags, or
> a different set of dependencies. I would suggest that it doesn't make
> sense that to say that you are "blocked using conda" (except for
> documentation improvements and some information about any RPATH "magic"),
> but I can see that you are "blocked using our pre-compiled binaries" for any
> number of libraries and need different binaries. One of two options is
> possible: 1) help us build "better" binaries that cover a wider variety of
> users needs -- perhaps it's time to move from OS 10.5 as a base or 2) build
> better binaries for your own needs and host them on binstar.

Reading my email and your reply, I realize what I said may have been
aggressive, which was not my intent -- I am sorry. My usage of the
term "blocked using conda" only applied to the release of version 2.6
of yt, and the encouraged method of adoption.

We really, really wanted to use Conda for this release. We spent a
considerable amount of time testing, developing scripts to install
Miniconda seamlessly, building recipes, understanding the build
system, spinning up VMs, deploying these on our CI server, requesting
testing from others, reporting issues on github, submitting recipes,
trying to engage on the anaconda mailing list, and on and on.

And, in fact, we wanted to use conda so badly that we decided we would
*rewrite* a relatively large and fundamental routine in our code base
so that we would no longer have to have a build-time requirement of
linking a C library against HDF5. Unfortunately, this change could
not be ported to our older code base and can only go in our new
version, and so we ended up deciding that we would start encouraging
Miniconda/Conda/Anaconda as a deployment strategy for the *next*
version of our code.

I guess at this point, what I'm saying is: we really want to use
Conda. But even for our use case, we weren't able to make it work on
the time and energy constraints available to us as a project. In
order to make it work for our use case, we actually modified our
project in a non-trivial way, but in a way that's not available to
PyNE.

>
> In the end building useful packages for others is hard. Conda gives you a
> way to store the meta-data about how to do that, create environments within
> which to try it out, and binstar provides a place to host those binary
> artefacts.
>
> The feedback is very much appreciated. We welcome any input and help with
> improving conda.

We've very much tried to provide input and help with Conda, and I hope
I do not sound like I am denigrating the efforts put forth by
Continuum. I'm conscious of the fact that Conda is a package that is
provided free of charge, but what I'm attempting to identify here are
the ways that the system Anthony has described meets different needs
and addresses slightly different problems.

Something that is perhaps unspoken here is that the POV of the
individual or organization providing a piece of software, attempting
to gain uptake of that software, and also attempting to foster an
ecosystem is likely very different from the POV of the packaging
provider. As a project member, my vested interests are in making it
very easy for anyone to install the project I work on, to use it, and
most importantly to contribute changes back upstream. So if someone
comes to me and says, "I would like to install your project using
MacPorts" or "I have brew installed my Python the way I like it, how
do I get your project into that system?" I actually very much want to
make that happen. It's much harder for me to say, "You should cease
using MacPorts and install Conda."

But what's come out from your emails and comments on the issue is that
you would like to see Conda, as a build system, emit the necessary
metadata for MacPorts/Apt/etc, or to be able to live within and/or
drive that ecosystem. I admire that, I hope that this discussion will
bear fruit, and I'd like to support a community build service in
whatever way I can.

Best wishes,

Matt

Aron Ahmadia

unread,
Nov 26, 2013, 10:52:37 AM11/26/13
to numf...@googlegroups.com
Hi Folks,

I've been reading and following this discussion and resisting the temptation to dive in too deeply so far.  Most of you know me, but for those who don't, I work for the US Army Corps of Engineers, one of the principal organizations that has been driving HashDist development.  Our mission has always been to provide a portable, build-centered platform capable of interacting with other package managers and managing complex building dependencies in an intelligent manner.  This is a very different goal from many of the other tools out there, as I think has already been pointed out, so I am most interested in finding ways to collaborate with existing projects without duplicating effort.

I owe you all a longer email explaining a little more about what HashDist (<#>) is, how it's evolved, and where we are going, but I wanted to quickly address this point Matt just made:


But what's come out from your emails and comments on the issue is that
you would like to see Conda, as a build system, emit the necessary
metadata for MacPorts/Apt/etc, or to be able to live within and/or
drive that ecosystem.

This is exactly the direction I'm taking HashDist next month, as Anthony alluded to in an earlier email.  We already have very good support for working on top of pre-existing software, and are now designing the infrastructure for emitting recipes to package managers.  I am happy to approach this in a collaborative way with other open projects, and conda is definitely one of our early targets.  My only concern is that we don't duplicate our efforts in implementing this, so I will try to keep everyone else aware of what we're up to.

Thanks,
Aron

Travis Oliphant

unread,
Nov 26, 2013, 1:24:51 PM11/26/13
to numf...@googlegroups.com
Hey Matthew, 

Thank you for your email and your willingness to provide such detailed information.   This helps us understand how you are thinking about the problem and also your use-cases.   

I understand the issues you are raising and wish we had time to help you more specifically with the problems you had.  I don't think these problems are isolated to "conda" problems, however.    I can appreciate that conda does not provide all the help to make it easy to build things.   Conda is *not* a "build-tool" it is a simple wrapper around build tools and a mechanism to create relocatable packages that make it easy for users.    

If you could point to other issues raised it would be very helpful.  The one issue you did point to looked like conda actually worked for you but there was a challenge with reproducing a build environment so that pip install would work with an Anaconda-produced hdf5 library.    This is at the heart of *all* build issues and why we have built conda the way we did.   The example in that issue illustrates more the brittleness of pip install than issues with conda.     If there are other examples you can point to that would be helpful.     

I *know* that h5py and yt and PyNE can be distributed as conda packages (because we do it all the time).  This may require some editing of the setup.py script and different builds of the dependencies --- but this is the nature of creating easily installed software.    

Our release of conda build as open source was an olive branch to help others who are trying to solve the same problem *we* have solved and to scale the community.   We will continue to provide conda packages for many projects and just wanted to give people the ability to be able to provide them as well.    This process is still maturing and will continue to mature which will mean better developer tools.   This just takes people to do it -- many, many people.    

I encourage you not to give up on creating conda packages for your software as I can promise you that it will make your life easier in the end and is the closest thing to a solution to the problems you are facing in distributing your software.  Making relocatable packages is harder initially than telling people to just do python setup.py install, but the result is *much* nicer for your users and will increase adoption of your software significantly.   The issues you encounter in build are exactly the same issues your users will encounter.    All the problems you faced with conda build are the problems your users will face trying to get the packages installed.   You can push this challenge off to different packagers --- packagers who don't have a vested interest in the Python for science ecosystem --- or you can help us with conda.   

We have a chance to solve this problem for the community in a unified way that will help everyone.   Let's solve the issues you have faced.   We need to recognize that it's one thing to post an issue and another to *fix* an issue.   With our resources, we will not be able to fix everyone's issue without more help.   This is largely because the greatest amount of work is identifying the actual issue when you are talking about the very large configuration space of a "build environment".   For example, the github issue you previously labeled ended in a place where the original poster was trying to be able to get python setup.py install to work *without* the clean build environment that conda build necessarily creates.   Is this a conda issue --- it seemed that conda worked to build what it said it would?  Possibly it still is (or more likely still an issue with a particular conda package) --- but it smells a lot like an issue of getting a build environment setup.   

We have to be very careful about identifying what the real issues are.   conda does not solve *all* the problems associated with building good binaries.  In fact, it's not a *build manager* at all.   It does make it easy to distribute those binaries and does provide a framework that lets you *do whatever you want* to actually build those binaries.   

Here is the world that I see we could get to very quickly that *everyone* would benefit hugely from: 

  * every project has a conda-recipe directory 
  * curated stacks (like Anaconda) exist on binstar.org that enable *anyone* to get whatever binaries they want

  * anyone with python (obtained through whatever mechanism the way) could then do: 
      * pip install conda && conda init
      * conda install <favorite_curated_stack>

  * getting end-user app software installed on people's laptops and desktops is as easy as making a conda package with an entry-point and an icon and pointing a simple "launcher" app to the repository where that package will exist  --- now publishing an *app* is as easy as copying a conda package to a repository --- no more making msi, dmg, installers, etc.  

All of this works *today* and you can see it in Anaconda the distribution.   This is all available for *anyone* to use now.  It does mean that you have to build your software and make it relocatable.   We have 2 mechanisms to make conda packages (you could write your own approach if you are ambitious): 

    * conda build <recipe_dir>
    * conda package --pkg-name PKG_NAME --pkg-version PKG_VERSION

The latter command will take *all* untracked files in your current distribution and build a conda package from those files.  So, you could create an environment with all the dependencies and then install the software *however you want*.  Then, use conda package to build up a package --- we do not recommend this method for generally creating packages because you will end up with a package that may need to be installed in *exactly the same way* it was built (non-relocatable).  You may also end up with additional files in the package that should really be part of *another* package.  But, it is a developer tool that you can use.  

There is nothing wrong inherently with making other binaries for other package managers, but it does not really help most of the users and potential users of the NumPy stack --- it just increases the number of "platforms" that every project has to support and will continue to divide the resources of our community and make it more difficult to create a useful package.    Having support from various package maintainers can be a strength long-term, but I would strongly argue that focusing our energies around making conda packages and then "converters" from conda packages to other install-types (starting with .whl) is *far more* beneficial for the community than trying to come up with the ideal "build-manager".   

We have an approach that works with conda.   We have evidence that it works with Anaconda.   There are still issues to iron out that we need others to solve (that's why the code is all open source).    We are sharing all of this precisely because I'm tired of people *not* using various tools because it's so *hard to install*.    Building software is still hard.  Our tools don't somehow magically make that (mostly configuration) challenge go away but they do let you record the configuration that worked and share that with others in a full-stack reproducible way.   

For Windows users, especially, conda packages give them the same benefit that other platforms with package managers have enjoyed.  Other platforms have package managers which have helped (but lacking environments these package managers are still sub-par for most uses --- I will again emphasize the importance of environments).   Having environments integrated into the packaging tool is extremely helpful. 
  
Improving the documentation and the tooling so that everyone can do it starting from whatever knowledge they currently have is a social problem and one I'm asking for help with.    This help, *requires* people's time and willingness to learn and engage and try and fail and try again.   It does not require building an entirely different tool which will just have the same social properties because building is still hard.   

It is definitely true that improved tools around conda to help developers would be useful, but that is what I'm suggesting people work on.   I don't think it is helpful to the goal of getting a Windows build service for people to work on orthogonal tools.   Creating different packages (.whl, brew, rpm, etc.) from conda packages is straightforward meta-data translation.  

I'm sorry for the long-rant.   But this an important question, and while I think there are many problems to solve, I am *extremely* familiar with the problems you are facing in distributing binaries of your software and conda was built to precisely solve those problems and it *does* solve those problems.   The software engineering around conda could absolutely be improved to make it easier for more and more people to use their own resources to self-solve there own build problems.    Please continue to work with us.    I fear that most of the problems are actually *unlearning* the bad habits of distutils and using different build and configuration tools besides distutils.  

We don't have a lot of spare cycles, but we will put on our roadmap getting yt and PyNE working as conda packages in the next month or two.    This is both because of your willingness to engage and also because I want to make sure we understand what issues may exist with both the binaries created in Anaconda and the process to build new binaries from those.     

In summary, I don't have any problem with people working on whatever scratches their particular itch.   This is the energy source of all open source and it's powerful.   This will naturally result in different "cuts through the problem" and tools that seem to overlap but don't interoperate.   But, for the purposes of Wes's original question which is a build system that will help Windows users, conda is the closest tool that actually helps this problem.    

I see hashdist as a useful tool that conda could possibly use to actually build a binary (and part of the hash would be the "build-string" in the conda package that results).   But, this is not the biggest problem people face when building Windows binaries where the "platform" is more regular than most supercomputing environments.    If you are trying to build packages to install on super-computers, then hashdist is an interesting tool.    But for making packages for laptops and desktops I don't see how working on hashdist makes sense over improving conda and making ".whl" packages from conda packages. 


More comments in-line...


On Tue, Nov 26, 2013 at 7:17 AM, Matthew Turk <matth...@gmail.com> wrote:
Hi Travis,

I'm going to reply to a few items you've raise as they relate to the
original topic of a community build service, but I also think at this
time I'm prepared to move any further discussion of the issues we've
specifically had with conda to the anaconda-discuss list.

On Mon, Nov 25, 2013 at 4:25 PM, Travis Oliphant <tra...@continuum.io> wrote:
> Thanks for the feedback and example.  I think this actually illustrates my
> points very well about the misunderstandings that exist with conda.   Conda
> is a package manager that can integrate with a wide variety of build tools
> -- and therefore help with the build management.
>
> The problem is *conflating* conda, the tool, with our particular set of
> builds available in Anaconda.   You can use conda and never use any of our
> Anaconda binaries --- you can also use Anaconda binaries without really
> using conda as well.

Yes, I can see how this is the case.  Conda is a build system, and in
fact, we went a long way down the road of providing an "alternate
universe" of packages.  
 
I'm happy to provide links to the extensive
mailing list discussions where this was brought up, or to provide
information about the methods by which we were building packages using
VMs and CI servers.

Please do send those links... or post them in anaconda-issues or even the conda project issues page if they are really "conda issues" and not anaconda-binary package issues.

 
 But where it ended was: we do not have the
resources to provide this.  And I think that's where what Anthony
raised is important -- regardless of the system by which packages are
distributed, built, managed, etc, if binary packages are to be
provided, someplace to build them has to be available.

Of course, which is the biggest benefit of BaTLAB which sounds great.  
 

I also don't think that we can easily discount the point he made,
which I am *extremely* concerned about, which is that the more work,
magic and effort required to support a transition between "user" and
"developer" of packages living in an ecosystem, the fewer transitions
that will occur.

Sure, but conda is starting to help with this.    More could be done but we have a great start.  Creating environments really helps moving from "user" to "developer" as it helps isolate your work and avoid many of the pitfalls of being a user *and* developer.   Conda makes creating environments possible (even with native libraries).   
I didn't take your tone as "aggressive" just that someone might get the wrong idea about conda.  FUD is the biggest enemy of progress as people decide on the basis of almost no information to put energy behind the *wrong* things all the time.   I feel very strongly that *conda* is *the right thing* --- but we need more people to put their energy behind it to help make it easier and even more useful.  
 

We really, really wanted to use Conda for this release.  We spent a
considerable amount of time testing, developing scripts to install
Miniconda seamlessly, building recipes, understanding the build
system, spinning up VMs, deploying these on our CI server, requesting
testing from others, reporting issues on github, submitting recipes,
trying to engage on the anaconda mailing list, and on and on.

The github issue tracker for conda and anaconda are the right places for this.   The community around conda is still small and tools are still being developed.   I saw a few emails and one github issue but not many.  I could have missed them, of course.   
 
Thank you for your efforts in this regard.   I wish we would have been more aware of what you were doing (not that we would have been able to alter our engagement as we have been especially busy these past months improving various aspects of conda and our other tools).   But, right now it's me, Ilan, and sometimes Aaron Meurer that helps answer conda questions in our spare time.   Most of these questions turn out to be "build configuration" questions rather than conda questions and it can be hard to sort that out.   

Working together to build a yt recipe is the *first* step before doing anything else that you described.   Of course, ideally you can do this on your own, but the conda build command is new (only about 5 months old) and has not seen much testing outside *our* configurations. 


And, in fact, we wanted to use conda so badly that we decided we would
*rewrite* a relatively large and fundamental routine in our code base
so that we would no longer have to have a build-time requirement of
linking a C library against HDF5.  Unfortunately, this change could
not be ported to our older code base and can only go in our new
version, and so we ended up deciding that we would start encouraging
Miniconda/Conda/Anaconda as a deployment strategy for the *next*
version of our code.

Thank you for being willing to try again.   I'm sorry you felt a need to rewrite code.  This should almost never be necessary (only if for some reason your code is preventing the creation of a relocatable build).    I have personally now built dozens of conda packages but I know that some packages are easier than others (depending on how hard the dependency chain is to build, basically).
 

I guess at this point, what I'm saying is: we really want to use
Conda.  But even for our use case, we weren't able to make it work on
the time and energy constraints available to us as a project.  In
order to make it work for our use case, we actually modified our
project in a non-trivial way, but in a way that's not available to
PyNE.

Thanks for explaining some of the efforts and for putting in the testing that is needed for any project to work.   If you have collateral from that testing process that you can send to either the anaconda list or the conda (project) issue tracker (depending on your best guess as to whether it was an issue with the anaconda binaries or a conda command),  it would be greatly appreciated.   We don't have the resources to always act immediately on these requests (we have to respond first to customers), we do highly value the feedback and will always take it seriously.   
  

>
> In the end building useful packages for others is hard.  Conda gives you a
> way to store the meta-data about how to do that, create environments within
> which to try it out, and binstar provides a place to host those binary
> artefacts.
>
> The feedback is very much appreciated.   We welcome any input and help with
> improving conda.

We've very much tried to provide input and help with Conda, and I hope
I do not sound like I am denigrating the efforts put forth by
Continuum.  I'm conscious of the fact that Conda is a package that is
provided free of charge, but what I'm attempting to identify here are
the ways that the system Anthony has described meets different needs
and addresses slightly different problems.

I still don't see the different problems addressed aside from building different kinds of binary packages.   This is a worthy goal *eventually* but I don't see it as the critical goal to get started.   I think it's a distraction for the initial efforts of helping Windows users.  
 

Something that is perhaps unspoken here is that the POV of the
individual or organization providing a piece of software, attempting
to gain uptake of that software, and also attempting to foster an
ecosystem is likely very different from the POV of the packaging
provider.  As a project member, my vested interests are in making it
very easy for anyone to install the project I work on, to use it, and
most importantly to contribute changes back upstream.  So if someone
comes to me and says, "I would like to install your project using
MacPorts" or "I have brew installed my Python the way I like it, how
do I get your project into that system?" I actually very much want to
make that happen.  It's much harder for me to say, "You should cease
using MacPorts and install Conda."

Oh sure,  I understand this perspective very well.  I appreciate you bringing it up.   That's why I think .whl packages are a part of the story.  Other packages can be part of the story as well --- long-term, but they are in the long-tail and not first-order kinds of problems --- especially for Windows users.   Once you show someone a system with environments I do not think they will not want to use older-style systems.  I agree there is inertia and people don't like to change, but providing solutions that work will help them change. 
 

But what's come out from your emails and comments on the issue is that
you would like to see Conda, as a build system, emit the necessary
metadata for MacPorts/Apt/etc, or to be able to live within and/or
drive that ecosystem.  I admire that, I hope that this discussion will
bear fruit, and I'd like to support a community build service in
whatever way I can.

Any concrete feedback about conda and Anaconda packages you provide is highly valued.  In that process, it is important to deconvolve the general problem of getting build configuration correctly specified with a tool that simply lets you record and reproduce that configuration once you have it.    We should identify which issues are "conda" issues, which issues are "Anaconda" issues, and which issues are "build issues related to mixing Anaconda packages and system packages".

I'm very interested in the automatic creation of additional binary packages from conda packages, we just don't have the resources nor the current incentive to work on that ourselves.   If you know of somebody with budget who would like us to do that, then please send them our way. 

Thank you for the discussion.  

-Travis

Donald Stufft

unread,
Nov 26, 2013, 5:37:15 PM11/26/13
to numf...@googlegroups.com, cgo...@uci.edu
I haven't had time to read this whole thread yet, but jsut as an FYI, a thing I want to integrate into PyPI at some point in the future is a build service where folks can upload a source package and have it built across a variety of platforms. I haven't done a ton of thinking around it yet and it's dependent on Source Dist 2.0 which is down the road a bit as well.

Just figured I'd throw that out there as well!

On Monday, November 11, 2013 6:31:12 PM UTC-5, teoliphant wrote:
Hey all, 

Building for Windows is a pain in the neck as most who have had to do it can at-test.   Companies like Continuum and Enthought typically have to dedicate full-time resources to it to make sure it happens for their distributions.   Christopher Gohlke has done an amazing service with his Windows installers.  

In a recent Twitter conversation, Wes McKinney emphasized how difficult it is for pandas, for example to continue to provide windows binaries for their project.  I have seen the same problem for NumPy and SciPy and I'm quite confident that other projects feel the same pain.   The move to Python 2 *and* 3 will only amplify that pain for every open source developer shipping a project. 

The time is right to solve this problem.  I think we can use NumFOCUS as a coordinating organization to work with companies like Rackspace, Intel, and Microsoft and community leaders to provide a binary build service for all the NumPy-stack projects.  I think we can use the public conda recipes for creating Windows packages and Continuum is willing to provide any insight we have from our current Anaconda recipes. 

At it's core, this means at least: 

1) Provisioned machines for building on at least Windows, Mac, and Linux (but starting with Windows):
2) Public and available Recipes to build on major platforms.  
3) Good Fortran/C/C++ compilers available on these machines. 

The goal would be to allow developers to get "mostly automatic" builds on all platforms.   Some tweaking would be necessary to really get things to work, but between all the people already involved we could create "standard build recipes" for all packages and then allow access to build dev machines to help debug the trickier builds.  

One thing that has to be resolved of course is "what to build": 

I would strongly encourage and support building conda packages.  We have already started some work on a build service to create conda packages on other platforms for those uploaded for one platform.   The intent is for this service to be free to open source projects but we would be happy for a NumFOCUS-supported program to exist.  Our reason for doing this is to help the community. 

We may need to be able to generate .whl packages as well  (conda allows packaging things whl was not intended for --- python itself, for example, but also C and C++-libraries that are needed for many scientific packages).  Regardless, it is straightforward to produce .whl packages from conda binary packages, so if we get conda packages built, then .whl packages are easy to make too from that.

We may also need to generate "windows executable installers" though I'd really like to deprecate those as I don't think they are the right answer for Windows. Conda (and even pip with .whl files) will be a much better approach for Windows users. 

binstar.org is already available as a binary artifact repository for use by the system.  

We need someone to head this up.   If you are interested, please respond either publicly on this list or to ad...@numfocus.org or to me personally if you would like discretion.   Perhaps NumFOCUS could buy someone's teaching time from them. 

We can work with vendors to get machine time and necessary compilers and I'm sure once we advertise this program we can get money from industry as well.   But, we will need resources for the person to head this up.   

I estimate the cost of this as at least $80k to get started (mostly in the time of some-one heading this up) and about $20k / year in on-going costs.   I would welcome other estimates.     I suspect I'm being too lean.    

Continuum cannot foot the entire bill for this beyond what we have already done in creating conda and Anaconda --- and I suspect nobody else can either.    But, together we can pull this off as a community. 

Best,

-Travis

For those needing a refresher on just how much conda does for this problem, here are my slides on conda presented at the recent PyData Conference:  https://speakerdeck.com/teoliphant/packaging-and-deployment-with-conda



Andy Ray Terrel

unread,
Nov 26, 2013, 10:02:15 PM11/26/13
to numf...@googlegroups.com
Hello all,

I think it's time to split this thread. There are too many
conversations going on here:

1) Get a build service for Windows (native I assume is what Wes wants,
not cygwin)

2) Build the ultimate developer environment

I suggest moving the conda discussion to the conda list and Hashdist
as well. As someone who has now worked on four package managers, I
enjoy your enthusiasm! Unfortunately, I think task 1 is more immediate
and can be solved rather quickly. I've spent the last two years
trying to drive 2 and now Aron is the one driving that effort.

-- Andy

Anthony Scopatz

unread,
Nov 27, 2013, 5:02:26 AM11/27/13
to numf...@googlegroups.com
Hello All, 

First off, I want to second what Travis says about this being a community thing. This is definitely an issue that spans the breadth of many projects, which is what makes it a perfect candidate for NumFOCUS to take on!

I mostly agree with Andy that this can be split up into different threads.  There is a lot going on here. I somewhat disagree about what these threads are and where they should live. Anaconda specific issues should probably go to that list, <#> issues thither, etc.  Top-level discussions on the direction of the community should stay on NF though.

Before we go too much farther, I'd like to propose the following taxonomy, which I was careful to use above and I hope will clarify way we think about the issues here:
  • build systems handle the actual building of software, eg Make, CMake, distutils, autotools, etc
  • package managers handle the distribution and installation of built (or source) software, eg conda, pip, apt, brew, ports
  • build managers are separate from the above and handle the automatic(?) preparation of packages from the results of build systems
What I need is a build manager, preferably coupled to a service which automatically does the build (eg BaTLab or similar).  Frankly, I am willing to work with anyone who is willing to work with me on this. If this is conda, great!  If this is hashdist, great!  If this cmake, great! 

This involves being able to quickly and reliably spin up a software stack (including compilers and headers) for which to build the new package on. However, doing this in the context of an existing project requires a reasonable assurance that changes will be supported upstream.  Otherwise, this effort may as well be a new project...which I think is something that we all want to avoid desperately.  

PyNE is a great rigorous test-bed for the scientific computing (and not just SciPy) community because it hits most of the problems.  It's name is sort of a misnomer as github thinks of it as a C++ project.  We ship first class C++, Cython, and Python APIs.  I think of this diversity as a strength and as a way of wheeling people from a C++ background into the Python world.  However, support for these other languages is why Matt Turk alluded to the fact that we can't just rewrite out code to opt-out of a direct HDF5 dependency.  We don't strictly rely on PyTables/h5py for all of our HDF5 needs. Linking happens at the C++ layer too.

Another issue that we have is that whatever solution we choose, we'll end up having to support the build and distribution of our dependencies, such as MOAB.  

Yet another issue that we have is that due to very serious export control issues we have to run a post-install step on the user's system to be able to install needed data based on their environment.  

The build manager should be able to handle of these things in a simple an elegant way for the developer.  I also largely agree with Travis on the timeline.  Having a single solution which works first is the right thing to do before extending ourselves to multiple solutions each of which may work. The problem we encountered at the PyNE tutorial was that 8 out of 10 Mac user's couldn't install conda well enough to build PyNE on top of.  This corresponded with the Mavericks release so I don't think that this reflects more poorly on Apple than conda, but if mini-conda had been able to bootstap a development environment than the install likely would have worked.  

Unfortunately for me personally, while Windows is nice to have it must remain a tertiary goal for me - after Linux and Mac.  Most nuclear engineering code do not nor will ever run under Windows.  If this is the primary goal as in Andy's point (1), I can't help as I don't have the time.  If Windows is part of a broader comprehensive effort I am ecstatic to help.  

At this point, I haven't "given up" on any solutions.  I think like a lot of people I am frustrated and disillusioned with the current state of affairs.  I just want something to work. I am willing to put forward time and effort getting something to work.  I am only able to do this now because the other PyNE developers are off doing the fun stuff :).

Sorry about the slow response times.  I am still on a working-vacation-travel-thing for the next 2+ weeks.  My time for extra-curricular activities is somewhat limited until then.  Hopefully we'll be able to hammer out basic marching orders by then.

I'll set up a call for next week.

Be Well
Anthony


Anthony Scopatz

unread,
Nov 27, 2013, 5:14:48 AM11/27/13
to numf...@googlegroups.com
Hello All, 

Here is a selfishly filled out WhenIsGood poll: http://whenisgood.net/mpg9fts  Please fill this out.  

You can see the results here: http://whenisgood.net/mpg9fts/results/tyhjwpj

I look forward to seeing people on this call.  I'll send out details as we get closer.

Be Well
Anthony

Josef Pktd

unread,
Nov 27, 2013, 10:44:04 AM11/27/13
to numf...@googlegroups.com
I second this one.

What I would like is to use Lapack in statsmodels' cython extensions without python overhead, build scipy on Windows with MingW and gfortran, and have quantlib with boost and other dependencies easy to build on Windows, and rpy2 with R on Windows.
Developers and maintainers of those packages are trying to get these to work.

However, what we need right now is 1) building and testing of packages on Windows.
(getting started with packages that don't have "life, universe and everything" as build dependencies.)

Josef
 

-- Andy
Reply all
Reply to author
Forward
0 new messages