--
You received this message because you are subscribed to the Google Groups "NumFOCUS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numfocus+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
I appreciate this input, but I don't think we should just wait for Travis-CI to solve this unless they are directly a part of this conversation and we have some control over the solution they provide.There are many approaches to solving these issues (why not ansible or puppet instead of chef), and the needs of NumFOCUS community are specific enough that it deserves it's own solution.
We also have great support from Microsoft who has resources available to provide to us already provisioned Windows VMs with Visual Studio.Matching remote and local setup is really quite trivial with conda and standard VMs (we do this all the time).
On Mon, Nov 11, 2013 at 5:48 PM, David Cournapeau <cour...@gmail.com> wrote:Travis in many ways views Python as second class (over more popular
>
>
>
> On Tue, Nov 12, 2013 at 12:29 AM, Travis Oliphant <tra...@continuum.io>
> wrote:
>>
>> I appreciate this input, but I don't think we should just wait for
>> Travis-CI to solve this unless they are directly a part of this conversation
>> and we have some control over the solution they provide.
>>
>> There are many approaches to solving these issues (why not ansible or
>> puppet instead of chef), and the needs of NumFOCUS community are specific
>> enough that it deserves it's own solution.
>
>
> Ansible does not work on windows, and puppet lacks many basic features (I
> actually don't like chef very much, but that's the one which works the best
> today for windows).
>
> What is specific to NumFOCUS that is not solved by travis-ci ? Most scipy
> projects already use travis-ci effectively (numpy, scipy, sklearn and pandas
> do at least), but it lacks os x and windows support. OS X is a function of
> money I suspect (since that's supported by travis-ci already), and travis-ci
> has mentioned on ML they welcome help to make windows happen.
langauages (for them) like Ruby). As a simple example, if you want to
test a package that requires matplotlib, scipy, etc., you have to
install from source, because they don't have binary packages in their
repos for all the Python versions
(another, better option is to just
skip their stuff and use miniconda). Also consider this issue:
https://github.com/travis-ci/travis-ci/issues/1106. Travis CI *still*
hasn't updated PyPy from 1.9, even though 2.0 was released in May and
indeed 2.1 has been out since August. For SymPy, we have had to leave
PyPy in the expected fail part of the build matrix since several bugs
in PyPy 1.9 cause the tests there to always fail.
PyPy is just an example. As I mentioned, it was the same way with
>
>>
>> (another, better option is to just
>> skip their stuff and use miniconda). Also consider this issue:
>> https://github.com/travis-ci/travis-ci/issues/1106. Travis CI *still*
>> hasn't updated PyPy from 1.9, even though 2.0 was released in May and
>> indeed 2.1 has been out since August. For SymPy, we have had to leave
>> PyPy in the expected fail part of the build matrix since several bugs
>> in PyPy 1.9 cause the tests there to always fail.
>
>
> I doubt pypy is a significant blocker for the scipy community...
Python 3.3 too. My point is that the Travis community doesn't have a
vested interest in Python (much less scientific Python). I'm not
convinced that they would fully understand the needs of the scientific
Python community. I'd be happy to be proven wrong, though, because as
you've noted, they've already solved many hard problems.
Taking my former hat of numpy release manager, building the binaries for the release is not the most time consuming part. Testing is, especially on windows where few people track master. I suspect that's the main bottleneck for most projects we are talking about.
On Mon, Nov 11, 2013 at 5:48 PM, David Cournapeau <cour...@gmail.com> wrote:
>
>
>
> What is specific to NumFOCUS that is not solved by travis-ci ? Most scipy
> projects already use travis-ci effectively (numpy, scipy, sklearn and pandas
> do at least), but it lacks os x and windows support. OS X is a function of
> money I suspect (since that's supported by travis-ci already), and travis-ci
> has mentioned on ML they welcome help to make windows happen.
Travis in many ways views Python as second class (over more popular
langauages (for them) like Ruby). As a simple example, if you want to
test a package that requires matplotlib, scipy, etc., you have to
install from source, because they don't have binary packages in their
repos for all the Python versions (another, better option is to just
skip their stuff and use miniconda).
And by the way, the less popular your dependency is, the less likely
you are going to be able to find some Debian repo that has it.
This
brings us full-circle. conda build is making this easier (I'm
currently working on using conda to install the optional dependencies
for SymPy). conda build + binstar makes it quite easy to do this,
except for the small detail that you have to actually have a Linux,
Windows, etc. machine to build the binary on in the first place.
>The package manager is essential. You need to be able to build the
> If the goal is helping the community for windows (and os x to a lesser
> degree), the hard part is buying, provisioning and maintening the build
> environments. If you can provide that, you get 95 % of the work. Conda (or
> other packager), is the remaining 5 %.
dependencies, and then reuse the binaries of those dependencies.
thanks,
Till
Hello Travis, Everyone,Sorry for taking so long to get back to everyone on this. I have been busy both traveling and clearing stuff of off my plate to prepare for this task (see the PyNE mailing list). I am definitely still planning on spearheading this effort, mostly out of my selfish needs for PyNE, yt, PyTables, etc.Luckily, the travel time has given me ample opportunities for reflection for this topic. I think that we have two problems that often get conflated into a single solution:1. Reproduciblely building software2. Reproduciblely installing software
These can roughly be thought of as "the user problem" and "the developer problem". I have yet to see a tool or suite of tools which adequately treats both of these problems in a first class way*. We need to make it easy for both developer to build their code on a variety of platforms and then distribute these builds through a variety of mechanisms (package managers).
What we are, in general, missing is the code that glues build managers to package managers. Up until now, I believe there has been some debate about whose responsibility this abstraction is. It doesn't fit cleanly into either the notion of build manager or the notion of package manager. Because of this it has ended up being no one's responsibility and therefore - where the problem is difficult in scientific computing - we end up with a poor user experience.
What we need is one of two things at minimum:1. one cross-platform build manager which is willing to generate packages for many package manager2. one package manger which is willing to act as a build manager and benevolent enough to generate packages for many other package managersThe reason many package managers must be supported is that it is the right thing to do by the users. This allows users to continue to use their favorite package manager. Or, failing that, the user may try other package managers until they find one which works. This is a more robust system than the one builder, one packager setup because users have many mechanisms for getting the code and its dependencies.
The list of platforms we need to target is fairly easy: Linux, Mac OSX, and Windows.The list of package managers that we need to target at a minimum are as follows: apt, conda, macports, homebrew, pip/easy_install.
Others should follow fairly easily from these. I dream of a world where every scientific computing package (not just the Python ones) will be installable via all of these mechanisms. This suite is chosen to support the major Linux package manager, the two major Mac OSX ones, and two cross-platform package managers (pip & conda) so that we have coverage of Windows. I'll leave the relative merits of pip vs conda for someone else to detail, but between them we have good coverage of user-space & system-wide installations.
The two remaining choices are "What build manager do we choose?" and "How do we make it easy to automate building on multiple different platforms?"How we automate can be done in a number of ways that are well understood. I think that BaTLab is nice because it is free and managed. This would get us a nice first cut. If we end up needing to move to some other larger system which we manage ourselves, we can do that at a later date. Note, I have been working on this topic in the past couple of weeks, even though I withheld sending this email. In this time, I built a GitHub-to-BaTLab web service similar to Travis-CI called Polyphemus (http://polyphemus.org/). It is still a bit experimental and we are working out the kinks for Cyclus right now. After these are hammered out, we'll do a release. This should be easily enough extendable to bitbucket as well, if there are any brave souls.
What build manager do we choose?This remains the big question to me. There are three primary options in my mind: cmake, hashdist, and conda. I am willing to consider other options but I am not willing to pick more than one to start with. The fourth other option, start a new project to perform the task of gluing build managers to package managers, is currently untenable since we haven't fully explored existing options.
Also, to answer your question Travis, I personally have had a lot of trouble installing PyNE on top of conda in the past. I have tried many times over the past couple of years.
I have typically been able to get it working eventually, though it often involves installing my own version of hdf5 and then doing custom path edits to make the PyNE see my version of HDF5 rather than conda's. Obviously this not a solution that is ready for users. Because it wouldn't have worked with conda's hdf5, I didn't try building a PyNE conda package.
Hashdist comes up here because Aron and I had a call about it right before SC13. It was during this call that the distinction between the user's needs and the developer's needs really came out. I my previous discussions with Dag about the role of hashdist it became clear to me that it really was not a package manager or a user tool. It also wasn't clear to me at the start of talking to Aron how to go about connecting hashdist the various package managers and (he can correct me if I am wrong) I don't think it was clear to Aron that front ending to package managers was a valuable thing to do. However, by the end of that conversation we had come to the agreement that - barring anything else - if Aron could get a couple of front ends working (say brew and cygwin since he is familiar with those) in the next month or so, then we here would have a pattern to build on for the other package managers.
The Proposal / Next Steps:I think that we should wait for Aron to create proof-of-concept front ends for certain package managers in hashdist. Meanwhile, I'll be worried about creating a service for submitting builds to BaTLab on the platforms mentioned. I'll also worry about drumming up financial support, support from the various projects, and investigating what it will take to interface with the various other package managers.
Other Thoughts:Whatever is built as a result of this effort needs to make life noticeably better.Travis, I am still deeply interested in using conda as a build manager. If you strongly feel that this would be a more successful choice or less work than hashdist I'd love to hear the reasons. Note that whichever build manager choice we go with, I'll still be targeting conda as a package manager since it is open source and supports user-space on Windows.
Hi Travis,
I'm going to reply to a few items you've raise as they relate to the
original topic of a community build service, but I also think at this
time I'm prepared to move any further discussion of the issues we've
specifically had with conda to the anaconda-discuss list.
Yes, I can see how this is the case. Conda is a build system, and in
On Mon, Nov 25, 2013 at 4:25 PM, Travis Oliphant <tra...@continuum.io> wrote:
> Thanks for the feedback and example. I think this actually illustrates my
> points very well about the misunderstandings that exist with conda. Conda
> is a package manager that can integrate with a wide variety of build tools
> -- and therefore help with the build management.
>
> The problem is *conflating* conda, the tool, with our particular set of
> builds available in Anaconda. You can use conda and never use any of our
> Anaconda binaries --- you can also use Anaconda binaries without really
> using conda as well.
fact, we went a long way down the road of providing an "alternate
universe" of packages.
I'm happy to provide links to the extensive
mailing list discussions where this was brought up, or to provide
information about the methods by which we were building packages using
VMs and CI servers.
But where it ended was: we do not have the
resources to provide this. And I think that's where what Anthony
raised is important -- regardless of the system by which packages are
distributed, built, managed, etc, if binary packages are to be
provided, someplace to build them has to be available.
I also don't think that we can easily discount the point he made,
which I am *extremely* concerned about, which is that the more work,
magic and effort required to support a transition between "user" and
"developer" of packages living in an ecosystem, the fewer transitions
that will occur.
We really, really wanted to use Conda for this release. We spent a
considerable amount of time testing, developing scripts to install
Miniconda seamlessly, building recipes, understanding the build
system, spinning up VMs, deploying these on our CI server, requesting
testing from others, reporting issues on github, submitting recipes,
trying to engage on the anaconda mailing list, and on and on.
And, in fact, we wanted to use conda so badly that we decided we would
*rewrite* a relatively large and fundamental routine in our code base
so that we would no longer have to have a build-time requirement of
linking a C library against HDF5. Unfortunately, this change could
not be ported to our older code base and can only go in our new
version, and so we ended up deciding that we would start encouraging
Miniconda/Conda/Anaconda as a deployment strategy for the *next*
version of our code.
I guess at this point, what I'm saying is: we really want to use
Conda. But even for our use case, we weren't able to make it work on
the time and energy constraints available to us as a project. In
order to make it work for our use case, we actually modified our
project in a non-trivial way, but in a way that's not available to
PyNE.
We've very much tried to provide input and help with Conda, and I hope
>
> In the end building useful packages for others is hard. Conda gives you a
> way to store the meta-data about how to do that, create environments within
> which to try it out, and binstar provides a place to host those binary
> artefacts.
>
> The feedback is very much appreciated. We welcome any input and help with
> improving conda.
I do not sound like I am denigrating the efforts put forth by
Continuum. I'm conscious of the fact that Conda is a package that is
provided free of charge, but what I'm attempting to identify here are
the ways that the system Anthony has described meets different needs
and addresses slightly different problems.
Something that is perhaps unspoken here is that the POV of the
individual or organization providing a piece of software, attempting
to gain uptake of that software, and also attempting to foster an
ecosystem is likely very different from the POV of the packaging
provider. As a project member, my vested interests are in making it
very easy for anyone to install the project I work on, to use it, and
most importantly to contribute changes back upstream. So if someone
comes to me and says, "I would like to install your project using
MacPorts" or "I have brew installed my Python the way I like it, how
do I get your project into that system?" I actually very much want to
make that happen. It's much harder for me to say, "You should cease
using MacPorts and install Conda."
But what's come out from your emails and comments on the issue is that
you would like to see Conda, as a build system, emit the necessary
metadata for MacPorts/Apt/etc, or to be able to live within and/or
drive that ecosystem. I admire that, I hope that this discussion will
bear fruit, and I'd like to support a community build service in
whatever way I can.
Hey all,Building for Windows is a pain in the neck as most who have had to do it can at-test. Companies like Continuum and Enthought typically have to dedicate full-time resources to it to make sure it happens for their distributions. Christopher Gohlke has done an amazing service with his Windows installers.In a recent Twitter conversation, Wes McKinney emphasized how difficult it is for pandas, for example to continue to provide windows binaries for their project. I have seen the same problem for NumPy and SciPy and I'm quite confident that other projects feel the same pain. The move to Python 2 *and* 3 will only amplify that pain for every open source developer shipping a project.The time is right to solve this problem. I think we can use NumFOCUS as a coordinating organization to work with companies like Rackspace, Intel, and Microsoft and community leaders to provide a binary build service for all the NumPy-stack projects. I think we can use the public conda recipes for creating Windows packages and Continuum is willing to provide any insight we have from our current Anaconda recipes.At it's core, this means at least:1) Provisioned machines for building on at least Windows, Mac, and Linux (but starting with Windows):2) Public and available Recipes to build on major platforms.3) Good Fortran/C/C++ compilers available on these machines.The goal would be to allow developers to get "mostly automatic" builds on all platforms. Some tweaking would be necessary to really get things to work, but between all the people already involved we could create "standard build recipes" for all packages and then allow access to build dev machines to help debug the trickier builds.One thing that has to be resolved of course is "what to build":I would strongly encourage and support building conda packages. We have already started some work on a build service to create conda packages on other platforms for those uploaded for one platform. The intent is for this service to be free to open source projects but we would be happy for a NumFOCUS-supported program to exist. Our reason for doing this is to help the community.We may need to be able to generate .whl packages as well (conda allows packaging things whl was not intended for --- python itself, for example, but also C and C++-libraries that are needed for many scientific packages). Regardless, it is straightforward to produce .whl packages from conda binary packages, so if we get conda packages built, then .whl packages are easy to make too from that.We may also need to generate "windows executable installers" though I'd really like to deprecate those as I don't think they are the right answer for Windows. Conda (and even pip with .whl files) will be a much better approach for Windows users.binstar.org is already available as a binary artifact repository for use by the system.We need someone to head this up. If you are interested, please respond either publicly on this list or to ad...@numfocus.org or to me personally if you would like discretion. Perhaps NumFOCUS could buy someone's teaching time from them.
We can work with vendors to get machine time and necessary compilers and I'm sure once we advertise this program we can get money from industry as well. But, we will need resources for the person to head this up.I estimate the cost of this as at least $80k to get started (mostly in the time of some-one heading this up) and about $20k / year in on-going costs. I would welcome other estimates. I suspect I'm being too lean.Continuum cannot foot the entire bill for this beyond what we have already done in creating conda and Anaconda --- and I suspect nobody else can either. But, together we can pull this off as a community.Best,-TravisFor those needing a refresher on just how much conda does for this problem, here are my slides on conda presented at the recent PyData Conference: https://speakerdeck.com/teoliphant/packaging-and-deployment-with-conda
-- Andy