|Software Carpentry, stacks, and Anaconda||Greg Wilson||10/11/12 10:24 AM|
The biggest hurdle we face when teaching Software Carpentry is
installation hell. Learners always want to use their own machines rather
than SSH'ing to a server we set up or using a VM so that they leave the
workshop with a complete working environment, but that means we're
always faced with two versions of Windows, three versions of Mac OS X,
Ubuntu, and (at least) one other Unix distro whose name I'm likely to
Since 2010, we've handled this by getting Windows users to install
Cygwin, and relying on the fact that Bash and the command-line SVN and
SQLite clients come with all sane Linux distros and Mac OS X . However:
1. Mountain Lion (Mac OS X 10.8) doesn't include SVN (or any other
version control client).
2. There isn't a DMG installer for it.
3. So we often have 3-4 students trying to download and install XCode
(or some portion thereof) at the same time using university-quality
WiFi, with predictable results.
4. And Cygwin has never worked as well as we'd like (too many things
aren't in the default install list).
5. And anyway, we want to be able to teach NumPy/SciPy/Pandas on the
afternoon of day 2 instead of SQL (depending on the audience), which
makes for even more pain.
We've therefore been experimenting with EPD, with mostly good results,
but the academic/non-academic licensing is a pain (we often have mixed
classes). This makes Anaconda look really interesting, but:
1. If the Windows install doesn't include MinGW or something equivalent,
so that people can build modules without doing a separate install, it's
2. It doesn't address the rest of the stack. In particular, it doesn't
include either command-line or GUI clients for SVN/Git/Hg (the three we
teach, in order of use).
I believe we could attract and retain more newcomers if one install
would give them a complete minimal dev environment . Do other people
on this list think this is worth shooting for, or is it mission creep?
 Software Carpentry isn't a Python course per se, but a computing
skills course: our typical curriculum covers the shell, version control,
testing, and databases as well as basic Python.
 It would _really_ rock if that environment included a smart editor
and graphical debugger for Python, but I'm trying to be realistic here,
and hopefully IPython Notebooks will take care of that for us Real Soon :-)
|Re: Software Carpentry, stacks, and Anaconda||Aron Ahmadia||10/11/12 10:27 AM|
Greg, is there a reason EPD Free doesn't work for you?
|Re: Software Carpentry, stacks, and Anaconda||Chang She||10/11/12 11:43 AM|
Mainly because of pandas: EPD Free doesn't come with pandas and even EPD Full comes with pandas 0.7.3 which doesn't include any of the new timeseries API (v0.8+).
That's where we had the most problem with students on Mac. They were instructed to get academic access for EPD Full but a lot of them just installed EPD Free instead. So even cutting out the timeseries API from the pandas curriculum wasn't enough. So we had to tell them to download Xcode, making sure that they installed the command-line tools with it, then install, etc etc etc. On windows it wasn't as big of a deal because pandas has windows binaries already.
|Re: Software Carpentry, stacks, and Anaconda||Fernando Perez||10/11/12 3:34 PM|
On Thu, Oct 11, 2012 at 10:24 AM, Greg Wilson <gvwi...@third-bit.com> wrote:Have you guys looked at the Github app for Windows ? It provides both
a nice GUI git client *and* it's completely self-contained:
It ships both a nice GUI app *and* a mingw shell in one shot, so you
can teach the command-line git but users can also pick up skills with
the GUI if they prefer. It doesn't include gcc, so it's not a full
mingw, but as long as you're not planning on doing full-blown C-based
work right off the bat, it might be a solution...
|Re: Software Carpentry, stacks, and Anaconda||Jonathan Rocher||10/11/12 8:17 PM|
thanks, this is great feedback about your needs. You guys are doing an awesome job on training, and we want to support these kinds of effort.
Let me mention that EPD Full does contain pandas 8.1 currently and has been containing 0.8.0 for a while. Remember that between EPD releases, you can use enpkg to update packages. We are now working to have 0.9 included as soon as possible.
On the code editor part, let me mention that EPD8beta/EPD8beta Free (soon to be replaced by its successor) comes with a multi-language text editor with ipython qtconsole integrated. It is for now still lacking the graphical debugger but that's in the works.
Forgive me for the sales speech, but I thought it was appropriate since you are asking about distributions.
Jonathan Rocher, PhD
Scientific software developer
|Re: Software Carpentry, stacks, and Anaconda||Nicolas Pettiaux||10/12/12 12:56 AM|
THe question you raise Greg has been a concerned at this year
EuroScipy 2012. People following mostly the advanced track did not
have the information or at least the tools installed, and even with
the information, some could hardly do it because of dependencies or
We are considering to build for next year conference a virtual machine
that we will provide, with everything installed correctly and
available on a disk everyoe can connect to.
If the image is virtualbox, and we have all the virtuabox running
environnement, (aka for different Windows, mac and GNU/linux
ubuntu/debian/redhat/fedora/suse) that would let everyone workd and
leave the conference with a fully setup and locally installed.
But I have not completely understood what you meant with the VM in
your message. Possibly you alreay do this ?
Nicolas Pettiaux, dr. sc - gsm : +32 496 24 55 01
Lepacte.be - « promouvoir les libertés numériques en Belgique » - hetpact.be
|Re: Software Carpentry, stacks, and Anaconda||Anthony Scopatz||10/12/12 3:07 PM|
I just want to second Greg's concerns here in regards to the
minimal, core stack being too minimal and not core enough.
While I am disappointed in the exclusion of HDF5 from this,
I am democratic enough to submit to the will of the people ;)
What makes things difficult, is that given stack as it stands
it would be impossible to get the packages I need using just
the mechanisms provided on new systems. So the core
distribution isn't a significant step up from installing everything
from scratch (which is what I do right now). That said:
1. Having a small core distribution is important!
2. I have some thoughts on a lightweight, stable & bleeding edge
package manger that I will be implementing for some of my
projects because current tools don't meet the need. I will make this
public once it is ready.
But to even make this useful on Windows (and maybe Mac?) would
is a compiler (probably gcc via mingw, though clang via llvm would
be an interesting choice) as well as the version control.
To Jonathan's point about EPD Full/Academic, this is a good solution for
most of my needs However, it still lacks version control which I need to
install on my own. But it is the mechanism I have been using to-date ;).
So I guess I am petitioning for some executive action in this democracy!
|Re: Software Carpentry, stacks, and Anaconda||Dag Sverre Seljebotn||10/12/12 10:46 PM|
On 10/13/2012 12:07 AM, Anthony Scopatz wrote:Huh? I actually didn't connect Greg's post with Thomas' efforts at all,
they're rather separate things? Greg's post read more like a petition
for EPD and Anaconda to include more stuff to me.
Thomas' efforts will make it easier to specify the dependencies of your
code, collapsing 3-5 of your dependencies to a single item in your
manual. Also you can easier bump your dependency requirements for NumPy,
SciPy, etc. in lock step, rather than having to think about which SciPy
matches which NumPy and so on.
Also, it's a bit of a PR and user-friendliness initiative.
Interesting. I'd love to hear more about what makes this different from
all the other attempts out there.
The funding for Hashdist (https://github.com/certik/hashdist/wiki) is
slowly coming along (it's been stuck in a bureaucratic mill for ~1 year
now) and I'm optimistic about being able to spend two months full-time
on that soon (within the end of the year). Fingers crossed.
While Hashdist will be targeted for HPC clusters initially, that's just
due to the funding source and my own needs -- there's nothing stopping
the approach from working well with Windows and Mac pre-compiled
packages if somebody interested steps up.
What makes me optimistic about Hashdist is that it's a new idea in this
domain (although Nix http://nixos.org provides prior art), rather than
new wrapping around the same old concepts.
For me it's all a matter of getting some funding through. I think the
problem is way too hard to be left to spare-time efforts.
|Re: Software Carpentry, stacks, and Anaconda||Dag Sverre Seljebotn||10/12/12 11:32 PM|
|Re: Software Carpentry, stacks, and Anaconda||Gaël||10/13/12 2:34 AM|
On Fri, Oct 12, 2012 at 05:07:39PM -0500, Anthony Scopatz wrote:Why? I can install everything I need on most reasonnably recent linux
systems using the package manager.
I have the impression that the developers keep asking for leeding edge
version of the packages to be part of the common stack. As a result, the
difficulty to install this common stack grows a lot. Yes, it is handy to
have the latest scipy (graph library, yey!) or the latest numpy
(linalg.sloget, I love you). However, if we are discussing advanced
developers, it is easy to maintain a compatibility and backport layer
. If we are discussing beginners, maybe there is more benefit for them
to have a stack that is easily installed than to have the new features.
My 2 cents,
 See for instance
|Re: Software Carpentry, stacks, and Anaconda||Thomas Kluyver||10/13/12 2:42 AM|
On Friday, October 12, 2012 11:08:00 PM UTC+1, Anthony Scopatz wrote:I just want to second Greg's concerns here in regards to the
You probably know what I'm going to say: I don't think it's the place of the Scipy stack standard to define something as complete as Greg wants. That's better handled by a distribution. By analogy, Linux Standard Base doesn't specify everything you'd expect in a Linux distribution.
That said, I hope the standard improves this situation somewhat, by being a consistent core for distributions. For instance, I hope that Enthought will consider adding pandas and sympy to EPD Free, so that it meets the standard.
|re: Software Carpentry, stacks, and Anaconda||Greg Wilson||10/13/12 5:33 AM|
In answer to Nicolas Pettiaux's question, we have tried giving students
virtual machines with everything installed. On the one hand, it means
they only have one installation challenge to get past (though they'd
better all do it *before* class starts, because having two dozen people
download a VM image on univeristy-quality WiFi simultaneously is a
recipe for frustration). On the other hand, it means they leave class
*without* their own laptop being set up, which means that when they get
back to their office and want to use what we've shown them with
everything they're already used to using (e.g., their MATLAB-on-Windows
tools), they either have to keep switching back and forth between our VM
and their "real" environment, or figure out how to install stuff in
their environment on their own. In summary, I think it makes things
easier in the short term for everyone, but harder in the medium term for
|Re: Software Carpentry, stacks, and Anaconda||Nicolas Pettiaux||10/13/12 5:48 AM|
2012/10/13 Greg Wilson <gvwi...@third-bit.com>
true unless you provide the VM and every needed tools on some USB keys that can easily and fast go to everyone's HD.
On the other hand, it means they leave class *without* their own laptop being set up, which means that when they get back to their office and want to use what we've shown them with everything they're already used to using (e.g., their MATLAB-on-Windows tools), they either have to keep switching back and forth between our VM and their "real" environment, or figure out how to install stuff in their environment on their own.
this is true. As said by Gael, one of the easiest solution for many people/beginners would be to move to a decent GNU/linux distribution instead of Windows that has, at least I think so, a poor program management, without dependencies. But then they would have to get to know another environnement for most of the othre things they usually do (probably it would be come simpler in the long term for most of the scientist to have GNU/linux than windows)
In summary, I think it makes things easier in the short term for everyone, but harder in the medium term for them.
indeed. So it depends on the objectives. For the conference EuroScipy, we will try both : a VM for the lazy but easy, and the description of the needed packages for the one who want finally a decent working environnement.
We know that it will not be easy to satisfay all.
But for the latter, I would recommend : switch now to Ubuntu.
|Re: Software Carpentry, stacks, and Anaconda||Dag Sverre Seljebotn||10/13/12 8:21 AM|
On 10/13/2012 07:46 AM, Dag Sverre Seljebotn wrote:Woah, this sounds like sour grapes, that wasn't the intention at all,
I'm sorry. I am *very* interested in hearing about your ideas and needs
for package management and would love to Skype about this at some point.
|Re: Software Carpentry, stacks, and Anaconda||Andy Terrel||10/13/12 11:07 AM|
I think setting up a google hangout or conference call at some point
to chat about this would be very good. There are a lot of people with
different notions of packaging out there. I had a conversation with
Jesse Noller about this, and he is definitely interested in finding
funding for this problem to be solved. Unfortunately, solved means
many different things in our community. For example, every org I've
been involved with has their own solution (and one of them four
different python packaging solutions).
|Re: Software Carpentry, stacks, and Anaconda||Dag Sverre Seljebotn||10/13/12 12:08 PM|
A conference call for scientific users/the NumFOCUS crowd, definitely.
Less sure about the Python crowd at this point; as you say, solved means
a lot of different things.
I mostly just wanted to check with Anthony if there was any overlap in
our goals and if there could be some cooperation.
I'm with Greg -- to me it's more about "scientific software stack"
("what do I tell scientists when I give tutorials", "how do I deal with
setting up clusters I have access to"). To me it's just as much about
how I push the version I need of git or Elemental or PETSc as it is
about Python software. At least that's where I see my own efforts.
I do notice that python-dev is on a better track these days; seeing as
they rejected distutils2/packaging for Python 3.3 and the 'wheel' spec
and so on. But if there's a chance of serious funding from PSF, I think
just getting funding for "Python packaging integrating with real build
tools" -- for which the likely candidate is Bento and David Cournapeau
-- involved more directly is better. At least I don't see a role for
myself in a situation where the goals are defined purely by how Python
software is distributed, and not the C/Fortran dependencies.
|Re: Software Carpentry, stacks, and Anaconda||Chris Kees||10/13/12 12:25 PM|
I'd be happy to join in a phone call or hangout the week of Oct 22. I agree with many of Greg's and Anthony's comments that the current spec for the core stack is too minimal for my needs, but I support what Thomas is trying to do and am not trying to force the pylab core spec to meet my spec. It may be best to keep it minimal but we might want to consider including 1) tools and documentation in the core on customizing the stack (e.g virtualenv or something like that) 2) solid advanced tools that show our community has a vision and is actually ahead of commercial environments in many ways (e.g. pytables, cython, mpi4py) , and 3) tools for users to easily verify that a distro meets the pylab spec (nose is a start I guess).
I worry that an overly minimal stack could backfire on HPC folks and gov users on locked down machines in particular in the following way: A community designed spec gives sys admins and procurement folks something to aim at. They will say "the new machine has an interpreter that meets the pylab spec, we've addressed the needs of you python guys". If the spec is so simple that it doesn't allow assessment of parallel scaling and large data performance, then instead of focusing and leveraging resources to improving our needs, it may actually provide an excuse for not helping real HPC python developers with serious problems. When my current stack fails with some compute node issue I often get "why aren't you using the system python" as the first response instead of actual attention to the error messages.
|Re: Software Carpentry, stacks, and Anaconda||Andy Terrel||10/13/12 12:32 PM|
|Re: Software Carpentry, stacks, and Anaconda||Thomas Kluyver||10/14/12 3:12 AM|
On Saturday, October 13, 2012 8:25:26 PM UTC+1, Chris Kees wrote:
This is the case even if we make the spec much bigger, though: it's never going to cover everything you require. In fact, a larger spec could make this worse: by appearing to be more comprehensive, it might make sysadmins less ready to accept that you need extra packages.
Drawing on Almar's suggestion, maybe HPC users should work out an additional 'scipy+hpc' package set, that you can ask sysadmins to aim for. But I don't think this belongs in the core Scipy stack.
|Re: Software Carpentry, stacks, and Anaconda||Andy Terrel||10/14/12 7:26 AM|
FWIW, I've asked Andreas Schreiber, to give us the pyHPC.org domain
for branding an hpc stack. If Ondrej and Dag build out hashdist in
the coming year, Aron and I can help promote it to vendors.
|Re: Software Carpentry, stacks, and Anaconda||Josef Pktd||10/16/12 10:32 AM|
Windows user here.
I don't understand these arguments that Windows users need to use cygwin or a Linux in a VM.
I have no idea how to use either of those (almost).
pythonxy installs everything automatically, including MingW (32 bit only)
now Pierre created also scientific python on a stick https://groups.google.com/d/topic/spyderlib/2kXXOykf5yg/discussion
As long as we don't try to compile heavy stuff, everything is just a few downloads and clicks away.
(just checking in)
|Re: Software Carpentry, stacks, and Anaconda||Anthony Scopatz||10/18/12 2:23 PM|
Sorry for not responding to this sooner. (I guess I have been swamped this past week.) There are too many points now to address individually, so I will simply speak generally.
I think a hangout / call to discuss this is a great idea.
To clarify my view, I think that the distribution (which Thomas has illustrated) should provide the basic tools to get started and on top of which other codes may be built. In my ideal world, this includes not only Python and the basic SciPy packages but a compiler, version control systems (git, hg, svn), and a scientific package manager (hashdist, my soon-to-be-thing, something else). Maybe these components will only optionally be installed if they are not already on the user's system (which in many cases, like windows, they are not). If you had these components you would basically be unstoppable. Anything that the user didn't have would be easily installable via the package manager. This would take the pressure off of putting extraneous or domain specific packages in the distribution.
Speaking to Greg's point, if we had a distribution like this, it would become the de facto standard way of teaching, using, and developing scientific python.
Dag, I do believe there is a lot of overlap with these goals and a space for collaboration. I'll take a deeper look at hashdist.
|Re: Software Carpentry, stacks, and Anaconda||Stefan van der Walt||10/18/12 2:51 PM|
On Thu, Oct 18, 2012 at 2:23 PM, Anthony Scopatz <sco...@gmail.com> wrote:I concur: having a compiler and revision control part of the standard
could make all the difference. All of a sudden, each user is a
potential contributor, and has the tools required to bail him/herself
out of a sticky situation [*].
[*] On Unix-like systems, I can almost always solve someone's problem
over the phone, because there's so much flexibility in what they can
do due to available tools. On Windows, simply because you don't know
what is available, it's a much harder problem.
|Re: Software Carpentry, stacks, and Anaconda||Anthony Scopatz||10/19/12 8:02 AM|
Looking at the doodle poll, it seems as if we can narrow this down to the following four times (all in US Central timezone):
Anything after Tuesday is right out. Anyone want to declare a winner? I'd say Tuesday at 11:00.
|Re: Software Carpentry, stacks, and Anaconda||Andy Terrel||10/19/12 8:08 AM|
Let's start a new thread on this. I was going to send out a basic
agenda and invite a few folks who might not be on the list.
|Re: Software Carpentry, stacks, and Anaconda||Anthony Scopatz||10/19/12 8:12 AM|
On Fri, Oct 19, 2012 at 10:08 AM, Andy Ray Terrel <andy....@gmail.com> wrote:
Sounds good to me. I'll let you handle this then.
|Re: Software Carpentry, stacks, and Anaconda||Yung-Yu Chen||10/19/12 8:19 AM|
On Fri, Oct 19, 2012 at 5:23 AM, Anthony Scopatz <sco...@gmail.com> wrote:
I want to echo the point of having a (scientific) package manager. In fact, I think a powerful package manager is much more important than a distribution to me. I have maintained an in-house package manager having 70+ packages for in internal software team of a non-research unit. Having a large cluster consisting of multiple phases of hardware and OSes, we need to build many dependencies from source, like gcc, FFTW, git, Qt, etc., and of course Python. The system just works, but I doubt that my colleagues would be excited about the Makefiles, shell scripts, and patches dwelling in the flat structure.
For the specific use case, it will be very beneficial to have a packages manager that allows (i) developers to install multiple runtime environments for development and testing, (ii) users to install necessary packages from a preset "recipe" that enables certain applications, and (iii) cluster administrators to provision maintainable runtimes for system-wide applications. Our software team is small and doesn't want to afford a dedicated release engineer or tool smith. Our sysadm is already overloaded and can't support us. A suitable packages manager will be just great.
I think the situation is similar in a research institute. Although the hardware and OSes are quite homogeneous in a supercomputer, deploying a code on multiple systems is still time-consuming. Installing dependencies and checking for consistency over time are not fun at all.
Maybe, I think, although the thread is about a flexible, cross-platform scientific Python runtime environment, a more beneficial approach could be implementing a package manager (like what hashdist would do, I suppose), and letting people to create their own runtime by using its facilities? If the package manager is to be implemented in Python, we might have a bootstrapping issue in some cases, but it could be mitigated by adding stages to the building process.
After all, I will be very interested in the work to improve status quo of deploying scientific packages and willing to join the hangout/confcall.
+886 (99) 129 4763
|Re: Software Carpentry, stacks, and Anaconda||Jonathan March||10/31/12 12:41 PM|
An update on the current capabilities of EPD Free and EPD Academic, directly addressing questions and concerns in this thread --
1. EPD's user account system has, for some months, provided Free accounts (not just Free downloads). These accounts allow registered Free users to update any EPD Free package to the latest repository version, by using either the enpkg command-line utility or the EPD 8 beta GUI package manager.
2. This system now also provides free Academic subscriptions (not just downloads as in the past), which allow updating any EPD package to any version in the repository, by using enpkg or the beta GUI package manager.
3. Thanks to the SciPy base package stack discussion and poll, we will include the additional consensus packages, specifically pandas and sympy, in EPD Free. Free repo update eggs for these packages should be available by early next week.
4. From (2), it follows that pandas 0.8.1 (and much more) is now available in the repo to Academic subscribers.
5. From (1) and (3) it follows that pandas 0.8.1 will shortly be available in the repo to registered Free users.
6. Pandas 0.9 is targeted to be in the repo early next week.
|Re: Software Carpentry, stacks, and Anaconda||Thomas Kluyver||10/31/12 3:44 PM|
On 31 October 2012 19:41, Jonathan March <jma...@enthought.com> wrote:Thanks, I'm really glad to see the core set of packages becoming
something distributions aim for. I'll update the details about EPD at
I'm also happy to see accounts for free users - I set a colleague up
with the academic version last year, and was disappointed that the
next version had to be freshly installed.
|Re: Software Carpentry, stacks, and Anaconda||Fernando Perez||10/31/12 4:46 PM|
On Wed, Oct 31, 2012 at 3:44 PM, Thomas Kluyver <tak...@gmail.com> wrote:+1, this is great news to hear; many thanks for keeping an eye on
these discussions and reacting in this direction!
|re: Software Carpentry, stacks, and Anaconda||Greg Wilson||11/1/12 3:27 PM|
+1 from us as well --- thanks!
|Re: Software Carpentry, stacks, and Anaconda||Jens Timmerman||11/11/12 9:31 PM|
I just discovered this group today, on SC12 (Aron Ahmadia pointed me here).
I'm part of the the HPC team of Ghent University  and we've been working on a installation framework (EasyBuild ) for over 3 years. We open sourced it in april 2012, and have a 1.0-rc out since last week.
My collegue Kenneth Hoste will presenting our paper  and anouncing 1.0 on PyHPC on friday. 
Our framework takes configuration files and an optional python class implementing the build and installation procedure to: download, build, install and create an environment-module file.
After a successful build these configuration files are committed in a svn or git repository and the buildlog is saved in the installation directory.
We currently made 144 packages that we can build public , and are adding new ones regularly.
Where python is built with numpy and scipy by default, and e.g. matplotlib is included as an `extensions`.
All these packages can be built with either gnu or intel compilers/toolchains (and it is straightforward to add another compiler/toolchain)
Since this thread is about python: we currently can install these packages (and all their dependencies) as separate 'extensions':
ASE CVXOPT Cython Docutils DOLFIN FFC FIAT GPAW h5py Instant Jinja2 petsc4py python-meep ScientificPython setuptools Shapely Sphinx Theano UFL Viper UFC boost and libxml2
CVXOPT Cython Docutils DOLFIN FFC FIAT GPAW h5py Instant Jinja2 libxcb libxml2 Mesa PETSc petsc4py Python python-meep ScientificPython setuptools Shapely SLEPc Sphinx SWIG Theano Trilinos UFC UFL Viper xcb-proto
Concerning windows support: our only dependency (except for python >= 2.4, < 3.0) is environment modules, since we're focussed on HPC installation. However it should be straightforward to add an option to skip creation of these and instead just spit out a PATH, LD_LIBRARY_PATH etc. variable to set.
We don't see it as a package manager as such, but it does indeed allows (i) developers to install multiple runtime environments for development and testing, (ii) users to install necessary packages from a preset "recipe" (easyconfig) that enables certain applications, and (iii) cluster administrators to provision maintainable runtimes for system-wide applications.
Our software team is to small to add to this package manager what everyone out there might need, but we want to share our 'recepies' (as easyconfigs) and 'installation procedure implementations' (as easyblocks) and patches (patches) in the hope that people would use them, and share back their own!
So I'm looking forward to hear some of your feedback here!
Or join us on #easybuild on freenode, or mail us 
HPC UGent team
|Re: Software Carpentry, stacks, and Anaconda||Kenneth Hoste||11/12/12 4:07 PM|
Just a small addition: Jens already mentioned that we'll be presenting EasyBuild at the PyHPC workshop at Supercomputing in SLC this Friday.
We'll also give a lightning talk on EasyBuild during the Python Birds-of-a-Feather session tomorrow (Tue. Nov. 13th) @ SC'12.
Slides will be made available afterwards, and do feel free to come up and talk to me and/or Jens during SC'12 if you're here.