Name for a meta-package of scientific python packages

179 views
Skip to first unread message

Travis Oliphant

unread,
May 24, 2012, 7:12:13 PM5/24/12
to numf...@googlegroups.com
Hey everyone,

Fernando had the great idea to create a meta-package name with dependencies on a core set of packages that NumFOCUS is supporting: The list of packages in this meta-package would not be as large as say EPD or Python(X,Y) or Sage, but it should be a decent sub-set:

Here is a brief (but not necessarily inclusive list):

NumPy, SciPy, Matplotlib, IPython, Scikits-Learn, Scikits-statsmodels, Pandas, scikits-image, SymPy, Cython.

The point of this meta-package would be to create a concrete release number that is tied to specific releases of the other packages and give us a name to talk about the group of packages as a whole (when citing, etc.).

The meta-package would have recommendations for documentation, continuous integration, and possibly even some utilities that made it easier to see the projects as a coherent group.

We need a name for this package. I am proposing the name Sciome. I would love to hear other ideas.

Best regards,

-Travis

Warren Weckesser

unread,
May 24, 2012, 7:16:04 PM5/24/12
to numf...@googlegroups.com


First hit on google for "sciome":  http://sciome.com/

Potentially confusing?


Warren

Travis Oliphant

unread,
May 24, 2012, 7:20:18 PM5/24/12
to numf...@googlegroups.com
Yeah.  Possibly, confusing....   Or, it could be just good advertising for them :-)

-Travis

Fernando Perez

unread,
May 24, 2012, 7:47:37 PM5/24/12
to numf...@googlegroups.com
On Thu, May 24, 2012 at 4:12 PM, Travis Oliphant <teoli...@gmail.com> wrote:
> Here is a brief (but not necessarily inclusive list):
>
> NumPy, SciPy, Matplotlib, IPython, Scikits-Learn, Scikits-statsmodels, Pandas, scikits-image, SymPy, Cython.
>
My original mental list was set up in layers, with python/numpy at the
foundation, then ipython, scipy, matplotlib and sympy for a 'core'
system (basically at the level of matlab, more or less), and adding
Cython, Mayavi, pandas, statsmodels, sklearn, skimage, pytables and
NetwrokX for the whole enchilada.

Put another way, your list + Mayavi, PyTables, NetworkX. Which I
agree, are all the most specialized of the bunch (and mayavi brings in
complex dependencies).

> The point of this meta-package would be to create a concrete release number that is tied to specific releases of the other packages and give us a name to talk about the group of packages as a whole (when citing, etc.).
>
> The meta-package would have recommendations for documentation, continuous integration, and possibly even some utilities that made it easier to see the projects as a coherent group.
>

I really would like this meta-package project to include fairly
specific guidelines on things like packaging, documentation layout,
etc. Once this is in place, we can make it easier for end users to
access unified docs, provide integrated help in IPython with direct
links to example notebooks, etc.

In particular, I'm now convinced that we (the scientific community)
*must* plan for packaging solutions that bypass/ignore distutils as
needed. David C has done enormous work on this front, but he was
never really able to gain any traction on python-dev, partly because
that group simply doesn't have our problems (such as how to link a
library against the fortran compiler of some supercomputing vendor).
I think our energies are better spent on solving this problem well and
for us, than on trying any further to convince the authors of
distutils to think about massive c++/Fortran extensions they've never
encountered.

Specifically, I'd like to move towards fairly strict and predictable
(because by being predictable they enable automation) guidelines on:

- docstrings (format, naming conventions, etc). We're already doing
reasonably well here with the numpy standard.

- documentation layout, formatting and installation. In particular,
standalone examples that can be found by IPython in a search and
brought to the user for direct execution in the easiest way possible
(notebooks that also produce scripts work very well for this).

- packaging: bento or whatever it is. I don't know exactly what the
current situation is on this front, all I want is for us to forget
about what Python does by default and come up with a solution of our
own that is robust and easy to use for users with extension code, and
that allows easily the installation of pre-compiled binaries in
non-root/non-admin scenarios.

Something like this will obviously have to evolve organically over
time, but I think that if we have a common view of this problem, we
can find a way to provide to end users a more coherent and integrated
view of the tools, without the individual projects losing much
autonomy and independence.

My take on this is to move in our core projects (and hopefully thus
set a standard for others to follow) towards a 'federation' model,
where the individual projects retain most of their independence but
they do give a little bit up by accepting common standards on a few
fronts. In return, they'll gain predictability, better reuse of
common solutions, and I'm convinced, much greater impact in the long
run (because we'll become more attractive to end users).

> We need a name for this package.   I am proposing the name Sciome.    I would love to hear other ideas.

I've been racking my brain over this without much success, but know
I'm terrible at naming things... Onyx is my current favorite (short,
has a y in it... as I said, I'm awful at names)

Cheers,

f

Nathaniel Smith

unread,
May 24, 2012, 8:05:12 PM5/24/12
to numf...@googlegroups.com
It seems to me that if you produce tools and conventions and convince
those projects of their value, then they'll pick them up, and if you
don't, then they won't. If having a name to rally around helps with
that then excellent, but the discussion of giving up autonomy makes me
a bit uncomfortable. The only time giving up autonomy would help this
plan go forward is if your common standards are things that projects
would otherwise reject... and it doesn't seem like it'd be hard to
convince any of those projects to switch to a genuinely better build
system, documentation standard, example database, etc.

-- Nathaniel

Wes McKinney

unread,
May 24, 2012, 8:25:43 PM5/24/12
to numf...@googlegroups.com
I pretty strongly agree that we need to have a standard bundle of
packages and a build/distribution mechanism that *just works every
time*. We're getting completely annihilated by this problem-- the
prototypical example is R, a much smaller community who seem to have
the packaging distribution problem pretty much figured out compared to
Python. Whenever non-Python people ask me how they can get started I
have to cringe and recommend they purchase EPD Full to avoid yak
shaving expeditions. It really sucks.

I do think more code reuse (or reuse of design patterns) among
packages would make sense in a lot of places. Although, as leader of
one of the projects on the list that has grown the most in the past 12
months, I'm not in a hurry to cede very much autonomy over the
development process. But, I have been doing a lot of work of
integrating pandas with other packages-- matplotlib and statsmodels in
particular-- so more collaboration between us for mutual benefit (and
creating a more cohesive / less confusing experience) as time goes on
only makes sense.

I'll start thinking about some names...

- Wes

John Hunter

unread,
May 24, 2012, 8:50:23 PM5/24/12
to numf...@googlegroups.com
On Thu, May 24, 2012 at 7:05 PM, Nathaniel Smith <n...@pobox.com> wrote:
> would otherwise reject... and it doesn't seem like it'd be hard to
> convince any of those projects to switch to a genuinely better build
> system, documentation standard, example database, etc.

It would be a major effort for matplotlib to switch its build system
(many platforms, C library dependencies, C/C++ src, many GUIs and
versions). What we have now sucks, because distutils has no proper
configure, but it is a hard won suckiness of years of tweaking stuff
for various platforms and GUI versions. And we don't use the numpy
documentation conventions -- with our documentation coming in at
around 1,000 pages, porting is a non-trivial effort. Even though we
have been on Sphinx/ReST for years, and we're I think the first
project in the scientific python world to make the jump, we still
haven't ported all of our API docs to proper Sphinx/ReST. So I think
the unifying efforts are great, and needed, but don't underestimate
the amount of effort involved.

Fernando Perez

unread,
May 24, 2012, 8:56:06 PM5/24/12
to numf...@googlegroups.com
On Thu, May 24, 2012 at 5:25 PM, Wes McKinney <wesm...@gmail.com> wrote:
> I pretty strongly agree that we need to have a standard bundle of
> packages and a build/distribution mechanism that *just works every
> time*. We're getting completely annihilated by this problem-- the
> prototypical example is R, a much smaller community who seem to have
> the packaging distribution problem pretty much figured out compared to
> Python. Whenever non-Python people ask me how they can get started I
> have to cringe and recommend they purchase EPD Full to avoid yak
> shaving expeditions. It really sucks.

That is *precisely* my concern.

> I do think more code reuse (or reuse of design patterns) among
> packages would make sense in a lot of places. Although, as leader of
> one of the projects on the list that has grown the most in the past 12
> months, I'm not in a hurry to cede very much autonomy over the
> development process.

I totally understand that, and to respond both to you and Nathaniel,
I'm also in the same boat: I don't believe in top-down, ham-fisted
prescriptions. Our open source world will simply ignore that.

But we do all have already some things that we follow: a setup.py
file, docstrings in functions, importing numpy for arrays, etc. All I
am thinking of are things at a similar level of burden, but with
agreed-upon commonality. If we sort out a packaging solution that
uses bento, and to make it work well it needs some special file in the
setup process, then we try to adopt that approach in a common way.
And for documentation, it would just be a matter of settling on having
probably a few hooks in the right place of the doc/build process,
along with a way to indicate layout of examples in a machine-readable
way. Once that is done, then it becomes possible to offer in IPython
help that's better than "go to google or ask in one of the various
mailing lists", which is just about the best we can do today. Anyone
who has used the Mathematica help browser will simply laugh at the
state of help searching in the current 'scipy stack'.

I don't have any delusions of creating a rigid structure that locks
projects into a single release schedule or anything like that. But I
do think that we will *all* be better served in the future, and will
have a far greater impact, if we unify a few things *just a little bit
more*.

Most importantly, I want to make sure it's *crystal clear* from the
get-go that nowhere in this am I proposing that projects lose any
individuality in the minds of the end users or their development
community. I think it's a major strength of our ecosystem that all
projects are 'first class citizens' at the table and retain their
identity. I'm only thinking of unifying a few things that may benefit
all of us.

One point that its important to keep in mind: we already have a lot of
this commonality informally specified. It's just that we only agree
at the 'python base' level, and that level is in this regard pretty
crappy, because the base Python world doesn't have our build problems,
doesn't care about docstrings and doesn't care about an integrated
help/documentation view of things.

I was expecting precisely this kind of pushback, and I hope we can
clarify through discussion that there's nothing to fear in this idea.
And obviously, that anything we do here will be done incrementally and
slowly, so that it can be adjusted via feedback as we go, and in a way
that makes people actually *want* to join :)

Cheers,

f

Fernando Perez

unread,
May 24, 2012, 9:04:17 PM5/24/12
to numf...@googlegroups.com
On Thu, May 24, 2012 at 5:50 PM, John Hunter <jdh...@gmail.com> wrote:

> It would be a major effort for matplotlib to switch its build system
> (many platforms, C library dependencies, C/C++ src, many GUIs and
> versions).  What we have now sucks, because distutils has no proper
> configure,  but it is a hard won suckiness of years of tweaking stuff
> for various platforms and GUI versions.  And we don't use the numpy
> documentation conventions -- with our documentation coming in at
> around 1,000 pages, porting is a non-trivial effort. Even though we
> have been on Sphinx/ReST for years, and we're I think the first
> project in the scientific python world to make the jump, we still
> haven't ported all of our API docs to proper Sphinx/ReST. So I think
> the unifying efforts are great, and needed, but don't underestimate
> the amount of effort involved.

Yup, I'm well aware of the mpl docs being a thorny one: they will
probably stay different for the foreseeable future, and that's fine.
We need to build things in a way that simply makes any adoption of the
common tools cause an improvement in the final experience, but which
works OK with the current base. So the mpl docs will look a bit
different to the users, well, so be it.

As for the build machinery, we'll have to see: I'm not the expert on
that front. It should be possible to grab the accumulated knowledge
from the matplotlib setup.py file and translate it into a different
system. But again, what I have in mind is something that we can build
*incrementally*, without any binary 'flag days' where it's all or
nothing. David C. has already made a lot of progress with bento for
numpy and scipy (whose build isn't trivial), and I know Dag and Ondrej
are also working on this problem.

Cheers,

f

Matthew Turk

unread,
May 24, 2012, 9:07:48 PM5/24/12
to numf...@googlegroups.com
Hi Fernando, everyone else,
The project I work (yt) on is not by any means a package suitable for
core inclusion; however, it builds on the packages and infrastructure
laid out in this thread, and suffers from many of the same
difficulties. I've been reading this discussion with interest, and
wanted to share how our project might view this development. So take
this data point for what it is.

The issues Fernando mentioned -- fortran compilation, supercomputer
installation, and providing an ecosystem are three of the biggest
issues we struggle with from a distribution perspective. Having this
effort and initiative will be of great benefit to the developers of
this package (as it will reduce the barrier to entry for new community
members) as well as to the benefit of new community members, by the
same token.

In particular, as a developer of a non-core package, having such a set
of standards -- and ensuring that there is an enumerated set of
guidelines for living within the broader Onyx/Sciome community --
would be hugely beneficial. As noted by Fernando in his first email,
the needs of the scientific community are somewhat distinct from those
of the broader Python community, and this has shaped the guidelines.

>
> Most importantly, I want to make sure it's *crystal clear* from the
> get-go that nowhere in this am I proposing that projects lose any
> individuality in the minds of the end users or their development
> community.  I think it's a major strength of our ecosystem that all
> projects are 'first class citizens' at the table and retain their
> identity.  I'm only thinking of unifying a few things that may benefit
> all of us.
>
> One point that its important to keep in mind: we already have a lot of
> this commonality informally specified.  It's just that we only agree
> at the 'python base' level, and that level is in this regard pretty
> crappy, because the base Python world doesn't have our build problems,
> doesn't care about docstrings and doesn't care about an integrated
> help/documentation view of things.
>
> I was expecting precisely this kind of pushback, and I hope we can
> clarify through discussion that there's nothing to fear in this idea.
> And obviously, that anything we do here will be done incrementally and
> slowly, so that it can be adjusted via feedback as we go, and in a way
> that makes people actually *want* to join :)

To respond to this, what would make our project want to shift (and we,
too, have a number of hacked up build conventions to easy static
linking, module linking on supercomputers, etc) would be just ensuring
that the build system is supported, not likely to go away, and already
on the system by virtue of it coming with the underlying packages. As
to the docstrings, we migrated to NumPy's docstrings a while ago.

-Matt

>
> Cheers,
>
> f

Yaroslav Halchenko

unread,
May 24, 2012, 9:15:20 PM5/24/12
to numf...@googlegroups.com

On Thu, 24 May 2012, Travis Oliphant wrote:
> We need a name for this package. I am proposing the name Sciome. I would love to hear other ideas.

sciome sounds cool but indeed might collide

I thought about

* spyence
- there is a video game
- used for "spy + science"

* spyci [to be pronounced as spicy, thus hardly a unique on google]


--
Yaroslav O. Halchenko
Postdoctoral Fellow, Department of Psychological and Brain Sciences
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik

Fernando Perez

unread,
May 24, 2012, 9:16:19 PM5/24/12
to numf...@googlegroups.com
On Thu, May 24, 2012 at 6:07 PM, Matthew Turk <matth...@gmail.com> wrote:
>  I've been reading this discussion with interest, and
> wanted to share how our project might view this development.  So take
> this data point for what it is.

Many thanks for pitching in, Matthew. Feedback from someone like you,
who's leading a discipline-specific project, is extremely important
and valuable here.

It would be great if one day, someone could publish an astrophysics
paper saying: to reproduce this, simply install onyx/sciome/X version
a.b and yt version c.d, and you're set. This would make a.b work
effectively as a hash for the versions of all the individual
components, but it would make it very easy to reference and use.
Furthermore, you could then say: to replicate this immediately, follow
these steps:

1. Start up the reference amazon onyx image ami-123456, that already
has all the a.b suite installed.
2 type 'super-install yt c.d' to get the right version of yt.
3. run script.

Today we have some of that in the starcluster images that JT Riley
prepares, but I still think there's a lot of room for improvement.

Cheers,

f

Stéfan van der Walt

unread,
May 24, 2012, 9:25:42 PM5/24/12
to numf...@googlegroups.com
On Thu, May 24, 2012 at 4:12 PM, Travis Oliphant <teoli...@gmail.com> wrote:
> We need a name for this package.   I am proposing the name Sciome.    I would love to hear other ideas.

Ah, our favorite game! What I like about Sciome is that it also
sounds like "Sci-Home", and taht the word "biome" evokes the
appropriate connotations. Not the easiest to spell, but hopefully
people will get that right eventually. Otherwise, a name totally
unrelated to "sci" may also work.

Stéfan

Yaroslav Halchenko

unread,
May 24, 2012, 10:10:39 PM5/24/12
to numf...@googlegroups.com
On Thu, 24 May 2012, Fernando Perez wrote:
> It would be great if one day, someone could publish an astrophysics
> paper saying: to reproduce this, simply install onyx/sciome/X version
> a.b and yt version c.d, and you're set. This would make a.b work
> effectively as a hash for the versions of all the individual
> components, but it would make it very easy to reference and use.
> Furthermore, you could then say: to replicate this immediately, follow
> these steps:

> 1. Start up the reference amazon onyx image ami-123456, that already
> has all the a.b suite installed.
> 2 type 'super-install yt c.d' to get the right version of yt.
> 3. run script.

> Today we have some of that in the starcluster images that JT Riley
> prepares, but I still think there's a lot of room for improvement.

And since finally the aspect of reproducibility got into the game, and
you might guess my take on that, with some backing [1]: although
there is a huge value in the uniform language or purpose specific
environment for people using FOSS-distribution ignorant OSs, there is
more to worry about when considering that it could be ran across a
variety of underlying computer architectures [2] and those platforms.

Therefore, in my view a complete system image (e.g. cloud AMI, or a
GNU/Linux distribution release codename/version) is valuable not only
for 'immediate replication' but as a least common denominator required
for 'precise replication'. Being able to say 'simply install
onyx/sciome/X' would be more valuable for getting people off the ground,
but I would not wholeheartedly count on it e.g. if someone would
read paper in 10 years and decides to replicate it on his system (unlike
a VM image or preserved AMI).

But before diving into a support of those varieties of OSs out
there I would recommend to concentrate indeed on just shaping up the
'bundle' (as it was mentioned -- mutually cross-supported versions and
agreed to be included versions, plan for maintenance releases,
unification of documentation, etc). That is where the foundation can be
of big value for the projects. I would also recommend to consider
upcoming Debian stable release as a candidate showcase platform, where
you still have a chance to get (nearly) all of the current (or planned)
versions supported with the help of the Debian community (even Ubuntu
folks might come of help). If I am not mistaken -- all member projects
of the sciome are already in Debian (skimage should arrive to wheezy
within a week if nothing bad happens). For most of them we have also
-doc packages which should be registered within the doc-base [3] so
documentation could be easily found. Conveniently composed AMI could
then provide not only a good demonstration of the bundle but a practical
"application" for people to use to solve real problems. VM image with
nearly *identical* appearance/composition could also be provided so that
anyone could give it a shout on his own box/laptop [4]. Who knows -- at
the end they might end up dual-booting their system having realized that
they do not use that original OS any more (but that is part of my plan,
not yours ;) )

Would there be a maintenance release schedule for those versions with
care taken about avoiding regressions and API breakage, additional
APT repository could provide those *stable* releases so people could get
their rock solid Python computing platform with adequate support and
improving quality. Altogether, with just a little of effort and
coordination this scheme can easily work out, but it needs to happen now
(Debian freeze is planned in 2nd half of June, so any new versions and
large modifications should get uploaded before).

I do foresee considerations/objections to such a plan, not the
last of which would be: but we like hacking, and "supporting" is not
really fun (thus we better "release often")... scientists need
up-to-date methodologies so supporting a year old release of XXX is
diminishing their "scientific novelty"... Although they all have their
merits, IMHO problem at the moment is indeed lack of ability to say
"install XXX and you would be golden -- it will be stable (thus with
support and "self-healing"), secure (thus updating), and you will get
great documentation and working environment". The scientific users I
see even rarely upgrade their years old boxes because they are afraid to
break them. That is why IMHO providing them with something really
reliable and easy to use (but may be not the shiniest thing out there)
should be among top priorities.

[1] http://www.nature.com/nature/journal/v482/n7386/full/nature10836.html
[2] I saw quite interesting differences in the behavior of flawed
coding decisions across architectures myself, so it is not just a tale
if you consider somewhat growing number of popular architectures in
mobile/computing.
[3] http://wiki.debian.org/doc-base
[4] I think we are quite successful with http://neuro.debian.net/vm.html

Andy Ray Terrel

unread,
May 24, 2012, 10:10:56 PM5/24/12
to numf...@googlegroups.com, Chris Kees, Dag Sverre Seljebotn, Ondřej Čertík
I wanted to respond to the view the we need a common build structure.

For what it's worth, there are some efforts that I have a small part
in to unify the stack for HPC systems. Roughly there are at least four
distributions in the DoD HPC system that I know of. I've been
coordinating a bit with other HPC Python managers on big computers for
a while to see what we can do (although I do have to admit on my
computers I just gave up and port EPD to it). Ondrej and Dag have
some documentation on what they are building at

https://github.com/certik/hashdist/wiki


Having supported build systems on several C++ / python science
projects I can say it is a daunting task. The big thing I think
something that numfocus could help is to create a movement around a
package system rather than everyone doing their own hack. I could
give lots of examples from Trilinos, PETSc, and FEniCS on where things
work or don't. But the one great thing about PETSc's BuildSystem was
once it worked pretty much anything that built on top of it worked as
well[0]. Now I'm not saying one should use BuildSystem but I would
take that any day over the dozens of scripts I have to install the
Python stack on our machines.


As far as the name Sciome goes... I don't like it. Branding-wise its
impossible to know how to spell it, or maybe I'm saying it in my head
wrong (too many ways to pronounce). I would go for something more
greek-god like Ophion perhaps.

-- Andy

[0] And PETSc's BuildSystem is so bad there is a highly inappropriate
YouTube video of it at http://www.youtube.com/watch?v=j1tbMW_Gxc4

Yaroslav Halchenko

unread,
May 24, 2012, 11:34:19 PM5/24/12
to numf...@googlegroups.com
I guess Ondřej might comment on it himself but wondering about the name
I thought why not "SPD == Scientific Python Distribution" (or Sciome PD)
but google lead me to
https://code.google.com/p/spdproject/
Source Python Distribution (SPD) which later was reformed into Qsnake
http://qsnake.com

So... may be it is worth reviving SPD as a "brand"?

Paul Ivanov

unread,
May 25, 2012, 12:12:05 AM5/25/12
to numf...@googlegroups.com
pyth (from pith: 3. the important or essential part; essence;
core; heart: the pith of the matter. 4. significant weight;
substance; solidity: an argument without pith. 5. Archaic .
spinal cord or bone marrow.)

or use it as an adjective: pythy stack

pyllar (from pillar, of the Doric, Ionic and Corinthian variety)
So much rests on its shoulders.

SciPyramid - a scipy superstructure!

best,
--
Paul Ivanov
http://pirsquared.org | GPG/PGP key id: 0x0F3E28F7

Brian Granger

unread,
May 25, 2012, 12:30:37 AM5/25/12
to numf...@googlegroups.com
On Thu, May 24, 2012 at 4:12 PM, Travis Oliphant <teoli...@gmail.com> wrote:
> Hey everyone,
>
> Fernando had the great idea to create a meta-package name with dependencies on a core set of packages that NumFOCUS is supporting:  The list of packages in this meta-package would not be as large as say EPD or Python(X,Y) or Sage, but it should be a decent sub-set:

It is not clear to me what is meant by a meta-package:

* Do we mean a binary installer for Python and said packages? Between
EPD, Python(X,Y) and Sage I don't think we need another distribution.
Plus, these days it is really easy to install everything even without
using one of these distributions.
* Do we mean a separate Python package that simply manages a global
namespace that makes it convenient to import things from a central
location? And centralizes documentation?
* A centralized and unifying website?

> Here is a brief (but not necessarily inclusive list):
>
> NumPy, SciPy, Matplotlib, IPython, Scikits-Learn, Scikits-statsmodels, Pandas, scikits-image, SymPy, Cython.
>
> The point of this meta-package would be to create a concrete release number that is tied to specific releases of the other packages and give us a name to talk about the group of packages as a whole (when citing, etc.).

But again what exactly is being released?

Cheers,

Brian

> The meta-package would have recommendations for documentation, continuous integration, and possibly even some utilities that made it easier to see the projects as a coherent group.
>
> We need a name for this package.   I am proposing the name Sciome.    I would love to hear other ideas.
>
> Best regards,
>
> -Travis



--
Brian E. Granger
Cal Poly State University, San Luis Obispo
bgra...@calpoly.edu and elli...@gmail.com

Wes McKinney

unread,
May 25, 2012, 12:37:32 AM5/25/12
to numf...@googlegroups.com
On Fri, May 25, 2012 at 12:30 AM, Brian Granger <elli...@gmail.com> wrote:
> On Thu, May 24, 2012 at 4:12 PM, Travis Oliphant <teoli...@gmail.com> wrote:
>> Hey everyone,
>>
>> Fernando had the great idea to create a meta-package name with dependencies on a core set of packages that NumFOCUS is supporting:  The list of packages in this meta-package would not be as large as say EPD or Python(X,Y) or Sage, but it should be a decent sub-set:
>
> It is not clear to me what is meant by a meta-package:
>
> * Do we mean a binary installer for Python and said packages?  Between
> EPD, Python(X,Y) and Sage I don't think we need another distribution.

EPD: Not free
Python(X,Y): Windows only
Sage: More monolithic, aimed at a more specialized audience

> Plus, these days it is really easy to install everything even without
> using one of these distributions.

Sadly, I wish this were true. Installation and package management is
the biggest roadblock to Python in businesses (and everywhere else)
that I (and many others) encounter on a daily basis.

Brian Granger

unread,
May 25, 2012, 1:26:53 AM5/25/12
to numf...@googlegroups.com
On Thu, May 24, 2012 at 9:37 PM, Wes McKinney <wesm...@gmail.com> wrote:
> On Fri, May 25, 2012 at 12:30 AM, Brian Granger <elli...@gmail.com> wrote:
>> On Thu, May 24, 2012 at 4:12 PM, Travis Oliphant <teoli...@gmail.com> wrote:
>>> Hey everyone,
>>>
>>> Fernando had the great idea to create a meta-package name with dependencies on a core set of packages that NumFOCUS is supporting:  The list of packages in this meta-package would not be as large as say EPD or Python(X,Y) or Sage, but it should be a decent sub-set:
>>
>> It is not clear to me what is meant by a meta-package:
>>
>> * Do we mean a binary installer for Python and said packages?  Between
>> EPD, Python(X,Y) and Sage I don't think we need another distribution.
>
> EPD: Not free

[I don't work for Enthought...]

I don't see how that is a problem. I have no idea what the average
hourly salary is of people on this list, but it is easily in the
$50-$150 per hour range. EPD costs $200 per year and that includes
phone installation support. If you spend more than 1-4 hours *per
year* mucking with installation issues, you should just buy EPD. If
you don't you are wasting time and money you could be spending
elsewhere. It is not about the purity of free/open software, it is
about time being a finite resource for all of us. For organizations
that can volume license the cost goes down even more. There is a
reason that EPD costs money - building and maintaining binary
installers for multiple platforms is incredibly time consuming. I am
sure that Enthought has many, many man years of labor invested in EPD
at this point. Do we really want to start from scratch and repeat all
that work? I guarantee that if we go that route, the effective cost
to the community will be greater than $200/year per person in time.
That price may not be paid by each of us, but someone will have to pay
for it.

What reasons are there to not use EPD?

> Python(X,Y): Windows only

Yes, limited scope, but for the most part Linux is covered nicely by
package managers.

> Sage: More monolithic, aimed at a more specialized audience

Yes definitely.

>> Plus, these days it is really easy to install everything even without
>> using one of these distributions.
>
> Sadly, I wish this were true. Installation and package management is
> the biggest roadblock to Python in businesses (and everywhere else)
> that I (and many others) encounter on a daily basis.

It is all relative to the user's experience. When I starting hacking
with Python, just installing numpy+scipiy was a major project, let
alone matplotlib. On that scale, things have become much easier -
that is what I mean.

There are definitely some packages that are more difficult to install
though, such as VTK+Mayavi. And yes, you do still have to do package
*management* even if things do install nicely. It really depends on
the type of user you are working with. For the undergrads I work with
at the University, I could *never* have them install packages one by
one. For those users I recommend EPD and don't really ever have a
problem.

Brian Granger

unread,
May 25, 2012, 1:32:31 AM5/25/12
to numf...@googlegroups.com
Oh, not to mention that EPD is linked against and includes Intel MKL,
which itself retails for $400.

Gael Varoquaux

unread,
May 25, 2012, 1:33:03 AM5/25/12
to numf...@googlegroups.com
On Thu, May 24, 2012 at 06:12:13PM -0500, Travis Oliphant wrote:
> Here is a brief (but not necessarily inclusive list):

> NumPy, SciPy, Matplotlib, IPython, Scikits-Learn, Scikits-statsmodels, Pandas, scikits-image, SymPy, Cython.

A remark that risks side-tracking the name discussion: this is a list of
packages that is, in my eyes, quite biassed toward data processing. There
are other important usage patterns of the scipy stack. A few years ago, I
would have had no use for scikit-learn, statsmodels, pandas and
scikits-image, but I would have much rather used pyDSTools, numexpr,
fipy. In addition, another package that seems should really be in is
NetworkX.

As a name, I can propose one that I think is farely explicit, though
maybe confusing and unexciting: scipylab. I must confess that what I
don't like about a proposal like 'onyx', is that it doesn't convey much
information or hint about what it is: if I see an amazon VM instance
called 'onyx', chances that I look at it are small. I can suggest a few
other names that are slighlty more descriptive and still fun: pyrogue,
pyle, pylar.

My 2 cents,

Ga�l

Dag Sverre Seljebotn

unread,
May 25, 2012, 3:12:13 AM5/25/12
to numf...@googlegroups.com
I'll join Brian in that we need to clarify the concepts a bit, this thread seems all muddled up to me so far. Here's my attempt to disentangle:

a) There's "synchronized release numbering", where the point is simply to avoid having to say that SciPy X needs NumPy newer than A but newer than B, and Pandas Y depends on NumPy Z. Instead, one creates a "virtual package" that depends on SciPy X, Pandas Y and NumPy Z, and then just refer to the number of that package. This is what Travis seems to be getting at to me.

It just requires somebody to track which versions of the core tools work together, and every time there's a new release for any of the core tools increment the version number. Then, at least for now, that would be "syndicated" to the distributions (so you'd say that Ubuntu 12.04 is Sciome 1.1-compatible, EPD 8 is Sciome 1.2-compatible, and so on). More of a "stamp of approval" than anything else.

b) There's common conventions for documentation etc.. (This is different from a) because there's no *breakage* involved)

c) There's "scientific software distribution", like EPD, Python(X,Y), Sage, and the tools me and Ondrej will be working on. This is in fact very different from a): i) You need to deal with all the optional packages (the last 5% that's different for everybody), ii) there must be a lot more detail, for instance, Travis didn't have to specify a LAPACK for his idea above, because if Pandas has a dependency on NumPy 1.7, it has that regardless of what LAPACK NumPy is built with.

d) There's the issue of build tools like distutils, waf, scons (this is different from c) because c) typically has to use a different build tool for each package, and nothing will change that)

e) There's the issue of synchronized release dates/a cadence

On one hand, R solves all of these under a common name "R". But on the other hand, I think these are mostly orthogonal problems and can be treated one by one. (If somebody had a gazillion $ and created a "center for scientific Python" and had a core team working just on making a nice experience, you could unify all of these efforts under the same name -- but I don't think tackling them together is constructive in our current situation. Also, keep in mind that the R solution is just a special case, although for a large segment, it's not for everybody. The R solution ported to the Python world would really be horrible for me doing HPC).

Now, we could probably all rant on this forever, but let's not forget that for b), c), d) what is mostly lacking is just a lot of work; the big visions people have had for decades. I spent a good part of March walking around and talking to people just to try to get a good idea for how me and Ondrej should start to tackle c)... but I don't think this thread is the right venue. Please, let's talk about software distribution and building in another thread.

As for a), when it comes to names, I'm -1 on any greek god; I like very descriptive names, not cute names...something around the name "SciPy" ("SciPy Stack", "SciPy Tools", "Tools for scientific Python") would be better, the conferences are already named around that. The point is that when communicating to others who are not using Python, what "trademark" do you use to get the message across? Well, "Python", or "scientific Python". We don't need yet another trademark.

Dag

Dag Sverre Seljebotn

unread,
May 25, 2012, 3:42:15 AM5/25/12
to numf...@googlegroups.com
Reading a bit in other threads, I see that b) (documentation, Continuous
Integration, etc.) is also something that Travis' was thinking would be
part of Sciome. I would disagree, it's important to keep things as
simple and lean as possible -- for instance, for a documentation
standard, you want that to spread as far as possible, not being
restricted to Sciome.

So how about this:

I think what we need is something like a "standards body"; a Sci-PEP
process. I'll start a new thread on that.

Dag

Stéfan van der Walt

unread,
May 25, 2012, 4:42:24 AM5/25/12
to numf...@googlegroups.com
On Fri, May 25, 2012 at 12:42 AM, Dag Sverre Seljebotn
<d.s.se...@astro.uio.no> wrote:
> I think what we need is something like a "standards body"; a Sci-PEP
> process. I'll start a new thread on that.

I think the idea is to lead by example, rather than by prescription.
No-one's trying to prevent the docstring standard from being adopted
elsewhere, e.g.

Stéfan

Dag Sverre Seljebotn

unread,
May 25, 2012, 5:12:11 AM5/25/12
to numf...@googlegroups.com
Point taken.

I still think it's a good idea to separate out orthogonal concepts as
much as possible and discuss them one by one.

The fact that there's a set of packages ready to implement any specs one
comes up with is of course useful, but I don't see that a branding name
is so important for that group -- i.e. that group of packages don't
necesarrily overlap 100% with the group of packages where you want to
synchronize releases behind a common version number, and that would be fine.

Also, as a practical matter, where exactly do I go to find the current
edition of the "docstring standard"? (No, I really don't know -- I'm
guessing it's either in the SciPy or NumPy repos or websites somewhere)

Dag

Scott Sinclair

unread,
May 25, 2012, 6:23:46 AM5/25/12
to numf...@googlegroups.com
On 25 May 2012 11:12, Dag Sverre Seljebotn <d.s.se...@astro.uio.no> wrote:
> Also, as a practical matter, where exactly do I go to find the current
> edition of the "docstring standard"? (No, I really don't know -- I'm
> guessing it's either in the SciPy or NumPy repos or websites somewhere)

https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt
https://github.com/numpy/numpy/blob/master/doc/EXAMPLE_DOCSTRING.rst.txt

Cheers,
Scott

Nathaniel Smith

unread,
May 25, 2012, 7:24:12 AM5/25/12
to numf...@googlegroups.com
On Fri, May 25, 2012 at 6:33 AM, Gael Varoquaux
<gael.va...@normalesup.org> wrote:
> As a name, I can propose one that I think is farely explicit, though
> maybe confusing and unexciting: scipylab. I must confess that what I
> don't like about a proposal like 'onyx', is that it doesn't convey much
> information or hint about what it is: if I see an amazon VM instance
> called 'onyx', chances that I look at it are small. I can suggest a few
> other names that are slighlty more descriptive and still fun: pyrogue,
> pyle, pylar.

Why not just resurrect the "pylab" name?

-- Nathaniel

Gael Varoquaux

unread,
May 25, 2012, 7:26:41 AM5/25/12
to numf...@googlegroups.com
On Fri, May 25, 2012 at 12:24:12PM +0100, Nathaniel Smith wrote:
> > As a name, I can propose one that I think is farely explicit, though
> > maybe confusing and unexciting: scipylab. I must confess that what I
> > don't like about a proposal like 'onyx', is that it doesn't convey much
> > information or hint about what it is: if I see an amazon VM instance
> > called 'onyx', chances that I look at it are small. I can suggest a few
> > other names that are slighlty more descriptive and still fun: pyrogue,
> > pyle, pylar.

> Why not just resurrect the "pylab" name?

Because it is not dead.

G

John Hunter

unread,
May 25, 2012, 7:31:52 AM5/25/12
to numf...@googlegroups.com
On Fri, May 25, 2012 at 6:24 AM, Nathaniel Smith <n...@pobox.com> wrote:

> Why not just resurrect the "pylab" name?

google pylab. There are 187,000 hits for a package with related
functionality. Might be a tad bit confusing.


I do like pyllar a lot though, since it is the core that other stuff
can rest upon. Nice suggestion Paul.

Nathaniel Smith

unread,
May 25, 2012, 7:35:30 AM5/25/12
to numf...@googlegroups.com
On Fri, May 25, 2012 at 12:31 PM, John Hunter <jdh...@gmail.com> wrote:
> On Fri, May 25, 2012 at 6:24 AM, Nathaniel Smith <n...@pobox.com> wrote:
>
>> Why not just resurrect the "pylab" name?
>
> google pylab.  There are 187,000 hits for a package with related
> functionality.  Might be a tad bit confusing.

The "package with related functionality" is ipython + numpy + scipy +
matplotlib. Not really incompatible with what we're talking about
here...

(I also see one hit on some "PyLab-Works" thing, but that appears to
be someone's personal project that hasn't been updated in years and
has no proper webpage.)

- N

Nathaniel Smith

unread,
May 25, 2012, 7:37:37 AM5/25/12
to numf...@googlegroups.com
On Fri, May 25, 2012 at 3:10 AM, Andy Ray Terrel <andy....@gmail.com> wrote:
> [0] And PETSc's BuildSystem is so bad there is a highly inappropriate
> YouTube video of it at http://www.youtube.com/watch?v=j1tbMW_Gxc4

Hah, fabulous. I had no idea that
grad-student-produced-python-propaganda-videos was a genre...
https://www.youtube.com/watch?v=1lBeungEnx4

- N

Travis Oliphant

unread,
May 25, 2012, 8:15:15 AM5/25/12
to numf...@googlegroups.com
On May 25, 2012, at 6:35 AM, Nathaniel Smith wrote:

On Fri, May 25, 2012 at 12:31 PM, John Hunter <jdh...@gmail.com> wrote:
On Fri, May 25, 2012 at 6:24 AM, Nathaniel Smith <n...@pobox.com> wrote:

Why not just resurrect the "pylab" name?

google pylab.  There are 187,000 hits for a package with related
functionality.  Might be a tad bit confusing.

The "package with related functionality" is ipython + numpy + scipy +
matplotlib. Not really incompatible with what we're talking about
here...


I had thought of pylab which does have overlap with what we are talking about here, but felt that it had too much of a history that is not quite what we might want (especially the over-arching namespace).    I think it would take some effort to redirect the pylab meaning, but it may be worth the effort. 

I kind of like pyllar as well.   But there is a 3D viz package for Python by the same name:   http://pyllar.sourceforge.net/

-Travis

Travis Oliphant

unread,
May 25, 2012, 8:37:26 AM5/25/12
to numf...@googlegroups.com
On May 25, 2012, at 2:12 AM, Dag Sverre Seljebotn wrote:


On Friday, May 25, 2012 1:12:13 AM UTC+2, teoliphant wrote:
Hey everyone,

Fernando had the great idea to create a meta-package name with dependencies on a core set of packages that NumFOCUS is supporting:  The list of packages in this meta-package would not be as large as say EPD or Python(X,Y) or Sage, but it should be a decent sub-set:

Here is a brief (but not necessarily inclusive list):

NumPy, SciPy, Matplotlib, IPython, Scikits-Learn, Scikits-statsmodels, Pandas, scikits-image, SymPy, Cython.

The point of this meta-package would be to create a concrete release number that is tied to specific releases of the other packages and give us a name to talk about the group of packages as a whole (when citing, etc.).

The meta-package would have recommendations for documentation, continuous integration, and possibly even some utilities that made it easier to see the projects as a coherent group.  

We need a name for this package.   I am proposing the name Sciome.    I would love to hear other ideas.



I'll join Brian in that we need to clarify the concepts a bit, this thread seems all muddled up to me so far. Here's my attempt to disentangle:

a) There's "synchronized release numbering", where the point is simply to avoid having to say that SciPy X needs NumPy newer than A but newer than B, and Pandas Y depends on NumPy Z. Instead, one creates a "virtual package" that depends on SciPy X, Pandas Y and NumPy Z, and then just refer to the number of that package. This is what Travis seems to be getting at to me.

Yes, this is the primary point.   If we just had this we'd have a place to start.   It really is a meta-package or virtual-package.  It could be as simple as a web-page that states <meta-name> - 1.0 is NumPy 1.7, SciPy 0.10, Pandas 11.7, ....

But, if you are really going to create this web-page, you might as well also build a simple RPM which just has the dependencies as well as a PKG file.     

In an ideal world, perhaps, there would also be installers maintained and freely available, but that is not a requirement of the proposal.    I don't work for Enthought, either, but I have had quite a bit of involvement with EPD, and understand the challenges quite well. 


b) There's common conventions for documentation etc.. (This is different from a) because there's no *breakage* involved)

Fernando feels pretty strongly about this, and we already have the NumPy documentation convention.    I think the point would be just to advertise this convention more strongly.   NumFOCUS could do this.    

NumFOCUS is also a useful place that information about continuous integration choices and strategies could be advertised.   I'm very interested in knowing who the people are that would be interested in this kind of work.  So far, I have not found anyone --- even though I have budget for this. 


c) There's "scientific software distribution", like EPD, Python(X,Y), Sage, and the tools me and Ondrej will be working on. This is in fact very different from a): i) You need to deal with all the optional packages (the last 5% that's different for everybody), ii) there must be a lot more detail, for instance, Travis didn't have to specify a LAPACK for his idea above, because if Pandas has a dependency on NumPy 1.7, it has that regardless of what LAPACK NumPy is built with.

This should be discussed in a separate thread if at all, like Dag suggests.   It is an independent point and has multiple lines of discussion given all the platforms that people actually will want to use (especially in the HPC world). 


d) There's the issue of build tools like distutils, waf, scons (this is different from c) because c) typically has to use a different build tool for each package, and nothing will change that)

This is also a point for another thread.   There are several issues, but I think the scientific python community can provide real value by just going with and promoting David C's bento package.   It is a nice system.   It is a package specification tool that can replace distutils.  One of it's better features is that it can integrate fairly easily with multiple build tools.    I've heard a lot of love for CMake, for example, 


e) There's the issue of synchronized release dates/a cadence

It's a good idea to have a cadence to the meta-package.   Something like a 6-month cadence would be a nice pattern perhaps alternating between stable and development releases. 

As for a), when it comes to names, I'm -1 on any greek god; I like very descriptive names, not cute names...something around the name "SciPy" ("SciPy Stack", "SciPy Tools", "Tools for scientific Python") would be better, the conferences are already named around that. The point is that when communicating to others who are not using Python, what "trademark" do you use to get the message across? Well, "Python", or "scientific Python". We don't need yet another trademark.

I'm interested how others feel about the names proposed.   So far, I personally like: 

Sciome
pyllar
pylab
pythy

What about:  pyome? 


-Travis

Travis Oliphant

unread,
May 25, 2012, 8:40:52 AM5/25/12
to numf...@googlegroups.com

On May 25, 2012, at 12:33 AM, Gael Varoquaux wrote:

> On Thu, May 24, 2012 at 06:12:13PM -0500, Travis Oliphant wrote:
>> Here is a brief (but not necessarily inclusive list):
>
>> NumPy, SciPy, Matplotlib, IPython, Scikits-Learn, Scikits-statsmodels, Pandas, scikits-image, SymPy, Cython.
>
> A remark that risks side-tracking the name discussion: this is a list of
> packages that is, in my eyes, quite biassed toward data processing. There
> are other important usage patterns of the scipy stack. A few years ago, I
> would have had no use for scikit-learn, statsmodels, pandas and
> scikits-image, but I would have much rather used pyDSTools, numexpr,
> fipy. In addition, another package that seems should really be in is
> NetworkX.

You make great points, Gael. I would love to hear what your list would be.

Yaroslav pointed out to me that there are 291 debian packages that depend on NumPy. So, I'm quite confident that my use-cases are only a small sampling of how people use the stack.


>
> As a name, I can propose one that I think is farely explicit, though
> maybe confusing and unexciting: scipylab. I must confess that what I
> don't like about a proposal like 'onyx', is that it doesn't convey much
> information or hint about what it is: if I see an amazon VM instance
> called 'onyx', chances that I look at it are small. I can suggest a few
> other names that are slighlty more descriptive and still fun: pyrogue,
> pyle, pylar.

pylar is a pretty good name from my perspective. Another one: xyome?


Thanks,

-Travis


Travis Oliphant

unread,
May 25, 2012, 8:45:25 AM5/25/12
to numf...@googlegroups.com

On May 24, 2012, at 10:34 PM, Yaroslav Halchenko wrote:

> I guess Ondřej might comment on it himself but wondering about the name
> I thought why not "SPD == Scientific Python Distribution" (or Sciome PD)
> but google lead me to
> https://code.google.com/p/spdproject/
> Source Python Distribution (SPD) which later was reformed into Qsnake
> http://qsnake.com
>
> So... may be it is worth reviving SPD as a "brand"?

I like the name SPD, but only if it's describing an actual distribution (on Debian it would work because there are so many packages, just having the name and creating a meta-package is enough --- Debian could lead the way here). But, perhaps SciPD would be better to distinguish it from source python distribution.

The meta-package name that we are trying to come up with a name for should likely have a separate name as well so that there could be some of the additional cohesiveness that Fernando is championing besides just the tagging of the packages.

-Travis

Travis Oliphant

unread,
May 25, 2012, 8:50:48 AM5/25/12
to numf...@googlegroups.com
>
> It is not clear to me what is meant by a meta-package:
>
> * Do we mean a binary installer for Python and said packages? Between
> EPD, Python(X,Y) and Sage I don't think we need another distribution.
> Plus, these days it is really easy to install everything even without
> using one of these distributions.
> * Do we mean a separate Python package that simply manages a global
> namespace that makes it convenient to import things from a central
> location? And centralizes documentation?
> * A centralized and unifying website?


Actually, not any of these as has been hopefully clarified (maybe the website point). Really, it's just a tagging of packages and hopefully perhaps a website where developers and users of these packages can go to coordinate.

NumFOCUS is also very interested in supporting packaging tools, continuous integration, development, etc. Having a name to describe the collection of packages that are core projects for NumFOCUS would be really helpful in many ways.

-Travis

Dag Sverre Seljebotn

unread,
May 25, 2012, 8:51:08 AM5/25/12
to numf...@googlegroups.com
Does this relate in any way to the scikits vs. SciPy discussion?

I think it's confusing with scikits, SciPy, "Sciome" all being
"convenient bags of stuff for scientific computing". If I'm a new user
and need functionality X, it's not obvious to me in which bag to start
looking.

I guess ideally (given perfect infrastructure etc.) I'd like SciPy to
disappear (disintegrate into separate scikits perhaps) and then use the
name "SciPy" for the meta-package.

But perhaps that is just extremely impractical..

Dag

Travis Oliphant

unread,
May 25, 2012, 8:58:21 AM5/25/12
to numf...@googlegroups.com
>>
>> I'm interested how others feel about the names proposed. So far, I
>> personally like:
>>
>> Sciome
>> pyllar
>> pylab
>> pythy
>>
>> What about: pyome?
>
> Does this relate in any way to the scikits vs. SciPy discussion?
>
> I think it's confusing with scikits, SciPy, "Sciome" all being "convenient bags of stuff for scientific computing". If I'm a new user and need functionality X, it's not obvious to me in which bag to start looking.
>
> I guess ideally (given perfect infrastructure etc.) I'd like SciPy to disappear (disintegrate into separate scikits perhaps) and then use the name "SciPy" for the meta-package.
>
> But perhaps that is just extremely impractical..

I like your idea in general. But, at this point SciPy is too entrenched as a library. The point of view of the new / occasional user is an important use-case for the new name.

Maybe we just call it 'P' or 'Q'

-Travis


Fernando Perez

unread,
May 25, 2012, 1:25:20 PM5/25/12
to numf...@googlegroups.com
On Fri, May 25, 2012 at 5:51 AM, Dag Sverre Seljebotn
<d.s.se...@astro.uio.no> wrote:
> If I'm a new user and need functionality X, it's not obvious to me in which
> bag to start looking.

And that's one of the things that I'd like to change: an installation
tag/name that includes a reasonably complete core set, whose
documentation can all be pooled together for good integrated search
(even if it requires re-indexing occasionally, as the Mathematica help
system does), would help tremendously on this.

Especially if the same help/search tools are available locally and on
the website, so that one can provide online links that can be
immediately used at the user's desktop.

> I guess ideally (given perfect infrastructure etc.) I'd like SciPy to
> disappear (disintegrate into separate scikits perhaps) and then use the name
> "SciPy" for the meta-package.
>
> But perhaps that is just extremely impractical..

I kind of feel the same :)

f

Gael Varoquaux

unread,
May 25, 2012, 1:51:38 PM5/25/12
to numf...@googlegroups.com
On Fri, May 25, 2012 at 07:40:52AM -0500, Travis Oliphant wrote:
> >> NumPy, SciPy, Matplotlib, IPython, Scikits-Learn, Scikits-statsmodels, Pandas, scikits-image, SymPy, Cython.

> > A remark that risks side-tracking the name discussion: this is a list of
> > packages that is, in my eyes, quite biassed toward data processing. There
> > are other important usage patterns of the scipy stack. A few years ago, I
> > would have had no use for scikit-learn, statsmodels, pandas and
> > scikits-image, but I would have much rather used pyDSTools, numexpr,
> > fipy. In addition, another package that seems should really be in is
> > NetworkX.

> You make great points, Gael. I would love to hear what your list would be.

I am honestly not sure. I would clearly put in NetworkX, most probably
numexpr and either pyDSTools or fipy, eventhought I don't use them
myself, and thus cannot really have an opinion on them.

> pylar is a pretty good name from my perspective. Another one: xyome?

It seems that pyllar is already used. How about pyrogue? Other
suggestions: spyglass, spyhole, spycy, spyce. Actualy, I think that I
like a variation on 'pylar': 'scipylar'.

G

Josh Klein

unread,
May 25, 2012, 7:25:14 PM5/25/12
to numf...@googlegroups.com
I feel a little out of my league making recommendations on this list, but given my background I figure I could add some naming thoughts into the mix.

I'd recommend against anything hard to pronounce (users need to verbalize it to share it in person), or confusing to spell (people who hear it need to type it).  I find that PyWord or WordPy naming is a fair approach, but can become unwieldy rather quickly. It often violates the above principle of simplicity unless the choice is obvious and descriptive.

It's also worth thinking about naming relative to other language alternatives, rather than to the Python world. I mention this because 1-to-4 letter acronyms seem to be common in scientific software. Here's some fodder for thinking about that: http://en.wikipedia.org/wiki/List_of_numerical_analysis_software

An acronym would let you stay boring but clear; SP for Scientific Python, as the most obvious example. And of course, mythology is a fine source for a simple brand name - here's some more fodder for that: http://en.wikipedia.org/wiki/Knowledge_deity

- Josh

Nathaniel Smith

unread,
May 25, 2012, 7:55:17 PM5/25/12
to numf...@googlegroups.com
On Fri, May 25, 2012 at 6:25 PM, Fernando Perez <fpere...@gmail.com> wrote:
> On Fri, May 25, 2012 at 5:51 AM, Dag Sverre Seljebotn
> <d.s.se...@astro.uio.no> wrote:
>> If I'm a new user and need functionality X, it's not obvious to me in which
>> bag to start looking.
>
> And that's one of the things that I'd like to change: an installation
> tag/name that includes a reasonably complete core set, whose
> documentation can all be pooled together for good integrated search
> (even if it requires re-indexing occasionally, as the Mathematica help
> system does), would help tremendously on this.

I don't see how any solution which requires that people use a fixed
set of package versions is going to get off the ground. People will
still need to upgrade random packages, the projects that have more
aggressive release cycles are never going to tell their users to wait
6 months for the next "meta-release" before upgrading, and how are you
going to convince people to install the right set in the first place?
It seems pretty unlikely that, say, Enthought would be willing to hand
over control to your meta-project to decide which combination of
versions go into their releases.

R doesn't even try to do this, FWIW -- each R release is (1) a new
version of the interpreter, (2) a new version of the small core
library equivalent (packages "base", "stats", "splines", maybe a few
others), and (3) a bunch of random high-quality packages off of CRAN
that come pre-installed. The latter packages can be upgraded just like
how you install any other package off CRAN, though: typing
install.packages("whatever") at the REPL.

If I were trying to make scientific python a more coherent
environment, I'd focus on small integration features, not tags. Things
like:

- Make installing new packages "just work". This is a major pain point
for me, because we work in a traditional unix-y environment and when I
need to help someone get started, the first problem is that python has
no concept of a user search path, so installing packages just doesn't
work, and people resort to horrors like 'sudo python setup.py
install'. (And even that doesn't work if you don't have root.) In
practice the solution is something like setting up a virtualenv as the
first thing and then putting a 'source .../activate' in one's .bashrc.
Not sure if Windows and OS X have the same problem. (I guess Windows
probably avoids the permissions issues but falls over as soon as you
try to install anything that needs a compiler?)

- Make installing convenient. Ipython %install and %upgrade magics
that defer to pip, with whatever magic is necessary to make sure that
sys.path gets updated if new .pth files are added?

- Richer conventions for pulling metadata out of installed packages --
one way of putting this is, it really should be possible for ipython
to figure out for an arbitrary cooperative package: how to run the
self-tests, how to find/build the local docs (and then index them),
where the docs are located online, how to pull out runnable examples,
etc. This "just" needs some conventions, like some extra
__magic_attributes__ at the top-level. I bet if ipython starts looking
for them then people will add them.

- Publishing canonical intersphinx links for all the packages, so it's
easy for docs to link to each other. (Put them all on readthedocs.org,
maybe?)

- Writing up HOWTOs for common packaging problems. That numpy
docstring HOWTO is great; but I never knew where to look to find it
before. (I'm still not sure how to set up a package to use the numpy
sphinx extensions, just a 20 line example would be super helpful.) Or,
say, "here's how to find BLAS using numpy.distutils". "Here's how to
make mypackage.test() work like it does for numpy and scipy." "Here's
the etc.

- Market the term pylab or whatever as the name for any environment
that has the above nice stuff bundled up together.

> Especially if the same help/search tools are available locally and on
> the website, so that one can provide online links that can be
> immediately used at the user's desktop.
>
>> I guess ideally (given perfect infrastructure etc.) I'd like SciPy to
>> disappear (disintegrate into separate scikits perhaps) and then use the name
>> "SciPy" for the meta-package.
>>
>> But perhaps that is just extremely impractical..
>
> I kind of feel the same :)

Me too -- at this point scipy is sort of a weird grab-bag workaround
for installing packages being hard. But disintegrating it would take
work and it seems like there's higher priority stuff...

- N

Stéfan van der Walt

unread,
May 25, 2012, 9:01:41 PM5/25/12
to numf...@googlegroups.com
On Fri, May 25, 2012 at 4:55 PM, Nathaniel Smith <n...@pobox.com> wrote:
> I don't see how any solution which requires that people use a fixed
> set of package versions is going to get off the ground.

I don't think you should underestimate the value of such a set. Being
able to tell someone that they can reproduce your search / run your
app using, e.g., Sciome 12, is much more appealing than guiding them
through the combination of subpackages needed. Also, such a versioned
set can very easily be supported by EPD, Python(x,y), Debian, etc.
without any additional infrastructure.

In fact, I would argue that, at least in the world of science,
reproducible research *demands* a versioned package approach that
we've been lacking for a long time.

The first obvious problem is that sub-package releases may not be in
sync with Sciome / whatever. But that's exactly why we have this
conversation, and why those packages need to be on board.

Stéfan

Yaroslav Halchenko

unread,
May 25, 2012, 11:09:08 PM5/25/12
to numf...@googlegroups.com

> Yaroslav pointed out to me that there are 291 debian packages that depend on NumPy. So, I'm quite confident that my use-cases are only a small sampling of how people use the stack.

heh -- just so that the truth gets known -- I was too fast with my assessment
of "apt-cache rdepends python-numpy | nl". It had duplicates and multiple
binary packages dependent on numpy might come out of a single project, which
would not be good either. So with a hope that it might be of interest for some
of you, I have gathered more adequate list. this time it is of source
packages, whose binary packages depend on numpy or scipy thus providing
indirect dependence on numpy.

Total number of projects is 104 (removing 2 versioned ones - rpy/rpy2 and
pymvpa/pymvpa2).

I also sorted them by the popularity contest (http://popcon.debian.org/) total
installation number for that 'source' package. It might be even higher than
numpy if some binary packages produced of this source package get installed
without numpy being installed). There is also few versioned pkgs left
(e.g. rpy/rpy2 and our pymvpa/pymvpa2) in place and installation count is quite
biased for some packages because they get installed as dependency of some big
meta-packages (e.g. science-machine-learning, science-neuroscience-cognitive,
etc).

Enjoy:

# popcon project
0 132200 pygtk
1 75925 opencv
2 60892 python-numpy
3 44961 inkscape
4 31769 pyopengl
5 11651 matplotlib
6 9074 plplot
7 7396 pygame
8 6343 gdal
9 3995 python-scipy
10 2790 python-scientific
11 2210 aubio
12 1719 mathgl
13 1265 grass
14 962 pytables
15 912 shogun
16 911 libvigraimpex
17 838 scikit-learn
18 754 syfi
19 720 rpy
20 695 sympy
21 669 spyder
22 666 pykaraoke
23 533 pymvpa
24 525 rpy2
25 505 scitools
26 477 h5py
27 459 libfreenect
28 458 python-biopython
29 433 pyqwt5
30 414 openmeeg
31 399 statsmodels
32 395 psignifit
33 386 pytools
34 384 mdp
35 381 mlpy
36 375 pyopencl
37 368 dolfin
38 364 pyepl
39 342 pyevolve
40 336 fiat
41 329 getfem++
42 328 pywavelets
43 327 instant
44 317 dballe
45 316 python-visual
46 308 pynifti
47 291 nipy
48 265 pydicom
49 264 pandas
50 264 viper
51 256 joblib
52 252 swiginac
53 198 numexpr
54 193 magics++
55 191 biosig4c++
56 184 nipype
57 168 gamera
58 141 babel
59 135 pyfits
60 132 nibabel
61 130 dipy
62 126 fofix-dfsg
63 125 pysparse
64 125 symeig
65 120 openopt
66 115 astk
67 109 necpp
68 105 guiqwt
69 95 cfflib
70 92 pymca
71 92 pyqwt3d
72 91 pycuda
73 88 pycg
74 86 cclib
75 85 veusz
76 80 brian
77 76 mpi4py
78 73 basemap
79 73 ferari
80 61 pdb2pqr
81 60 nitime
82 60 pymvpa2
83 48 pebl
84 42 rdkit
85 40 libgetdata
86 40 pygpu
87 35 expeyes
88 33 python-biggles
89 32 gastables
90 28 uncertainties
91 22 pylibtiff
92 20 pysurfer
93 19 pygrace
94 14 spherepack
95 13 pyepr
96 12 lazyarray
97 12 libmpikmeans
98 11 numm
99 10 pyentropy
100 10 skimage
101 7 pytango
102 6 stimfit
103 3 cmor
104 2 neo
105 2 taurus

Paul Ivanov

unread,
May 26, 2012, 2:13:31 AM5/26/12
to numf...@googlegroups.com
Travis Oliphant, on 2012-05-25 07:15, wrote:
> I kind of like pyllar as well. But there is a 3D viz package
> for Python by the same name: http://pyllar.sourceforge.net/

Yeah, but it looks like it hasn't seen a release since March 2007, and
been downloaded 21 times in the last year, 97 in the last two
combined.

http://sourceforge.net/projects/pyllar/files/stats/timeline?dates=2011-05-20+to+2012-05-26
http://sourceforge.net/projects/pyllar/files/stats/timeline?dates=2010-05-20+to+2012-05-26

best,
--
Paul Ivanov
http://pirsquared.org | GPG/PGP key id: 0x0F3E28F7

Aron Ahmadia

unread,
May 26, 2012, 3:03:34 AM5/26/12
to numf...@googlegroups.com

I feel a little out of my league making recommendations on this list, but given my background I figure I could add some naming thoughts into the mix.

I'd recommend against anything hard to pronounce (users need to verbalize it to share it in person), or confusing to spell (people who hear it need to type it).  I find that PyWord or WordPy naming is a fair approach, but can become unwieldy rather quickly. It often violates the above principle of simplicity unless the choice is obvious and descriptive.

It's also worth thinking about naming relative to other language alternatives, rather than to the Python world. I mention this because 1-to-4 letter acronyms seem to be common in scientific software. Here's some fodder for thinking about that: http://en.wikipedia.org/wiki/List_of_numerical_analysis_software

An acronym would let you stay boring but clear; SP for Scientific Python, as the most obvious example. And of course, mythology is a fine source for a simple brand name - here's some more fodder for that: http://en.wikipedia.org/wiki/Knowledge_deity

+1

Also, I recommend we follow Dag's example and split this thread out.  There are now 40 replies ranging on everything from naming to versioning to build systems to Hitler meme videos.  

-A

Nathaniel Smith

unread,
May 26, 2012, 7:02:18 AM5/26/12
to numf...@googlegroups.com
On Sat, May 26, 2012 at 2:01 AM, Stéfan van der Walt <ste...@sun.ac.za> wrote:
> On Fri, May 25, 2012 at 4:55 PM, Nathaniel Smith <n...@pobox.com> wrote:
>> I don't see how any solution which requires that people use a fixed
>> set of package versions is going to get off the ground.
>
> I don't think you should underestimate the value of such a set.  Being
> able to tell someone that they can reproduce your search / run your
> app using, e.g., Sciome 12, is much more appealing than guiding them
> through the combination of subpackages needed.

I think that if everyone actually were using the same set of packages,
that would have benefits, yes. I just can't see any way we can
actually convince people to do this. I'm certainly not going to stop
upgrading individual packages willy-nilly when it makes sense for my
personal needs...

> Also, such a versioned
> set can very easily be supported by EPD, Python(x,y), Debian, etc.
> without any additional infrastructure.

This just seems factually incorrect to me. Can you elaborate?

Debian unstable contains whatever the latest version of each package
is, as uploaded by the maintainers of each individual package. Debian
testing contains some algorithmically derived set of versions, based
on some automated QA stuff. Debian stable is a snapshot of testing
taken at some arbitrary moment, plus whatever bug fixes individual
maintainers upload. Where in this process do you think you can
convince them to support another set of package versions based on
NumFocus's arbitrary tagging? The best we could do would be to declare
that "Sciome 1" is whatever collection of versions ends up in the next
Debian stable. Which doesn't do much good for anyone not using Debian,
and not even that as soon as Debian uploads any bug-fixes.

There are certainly people here who are more knowledgeable about
Enthought's internal decision-making than I am. Do you think we can
dictate which versions of packages EPD should contain? When a large
customer calls up and says that they need the new version of FooPy
included, will Enthought say "no, Sciome says we should stick to the
previous version, sorry". Again, I guess we could just declare that
whatever Enthought does, that's what Sciome is. But I don't like that
idea.

> In fact, I would argue that, at least in the world of science,
> reproducible research *demands* a versioned package approach that
> we've been lacking for a long time.

I am a huge fan of reproducible research, but this trivializes the
problem. If I'm trying to reproduce someone else's experiment, the
last thing I'm worried about is being able to use a few less words to
describe a subset of their installed packages. That's swamped by the
problem of figuring out what the actual code they used was, what local
hacks they have, was their environment 32 or 64 bit and does that
affect anything, etc., etc.

At least if someone says they are using numpy 1.7.2, I can be pretty
sure that that's what they're using. If they say they're using Sciome
12, the first question is whether that's actually true, or did they
upgrade matplotlib to get some new chart type and then forget about
it...

A trivial tool for dumping out the versions of all installed packages
would be more IMHO more useful for solving this problem, and doesn't
require herding everyone in the scientific python world into
coordinating.

> The first obvious problem is that sub-package releases may not be in
> sync with Sciome / whatever.  But that's exactly why we have this
> conversation, and why those packages need to be on board.

As maintainer of some packages that might someday qualify for Sciome
inclusion[1][2], what benefits will I get from syncing my releases
with Sciome? I'm not going to tell people "here's a fix for that bug,
but don't install it, because then you'll be out of sync with Sciome".
A lot of the motivation for scikits in the first place is to let
people decouple their release management from scipy proper...

[1] https://code.google.com/p/scikits-sparse/
[2] https://github.com/charlton/charlton

Trying to convince everyone to follow some Grand Plan that doesn't
benefit them directly, or that doesn't benefit anyone unless everyone
gets on board, is one of the classic (and oh so tempting!) ways to
spend energy unproductively.

-- Nathaniel

Dag Sverre Seljebotn

unread,
May 26, 2012, 9:01:40 AM5/26/12
to numf...@googlegroups.com
I agree that anything like a 3-month or 6-month release schedule for
Sciome would be setting up for failure. But I think it could work if it
is always up to date. One should increment the Sciome version number
every time that both a) ONE of the core package gets a new release, and
b) that doesn't break other packages.

So a new release of a leaf package like Pandas would by itself almost
always *immediately* bump the Sciome version number. But if NumPy 2.0
breaks backwards compatability, then NumPy 2.0 is kept back until the
other core packages support it.

Yes, that gets you a lot of version numbers, but Sage has had bi-weekly
releases for some periods, and it still has the benefit that if somebody
says "Sage 3.23" I'll be able to know what set of packages that was.
Well said.

For instance, I think Fernando's idea of better documentation
integration is better served in a seperate SEP for integration of
documentation across *any* participating package, without caring about
any core subset (or not) at all. Something like a "docs.cfg" in the
package, or that a package registers with IPython, or something.

But I think identifying a core subset of packages is useful as a way of
dealing with backwards compatibility breakage.

Dag

Dag Sverre Seljebotn

unread,
May 26, 2012, 9:55:05 AM5/26/12
to numf...@googlegroups.com
And sorry for not heeding my own advice!

My feeling is one needs to actually start some new threads, otherwise the pressure will just find its way out in this one.

I'll likely start a thread about CI when I can get around to it (it's been going round and round on both cython and numpy lists forever...); and I'm hoping Fernando could start a thread on documentation integration...

Dag
--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Nathaniel Smith

unread,
May 26, 2012, 12:05:35 PM5/26/12
to numf...@googlegroups.com
Hmm, interesting idea. One could have a little bot watching PyPI and
bumping the version number whenever it noticed a new release of a
relevant package...

How do you know whether a new release breaks other packages, though? I
guess you build all of them, on all supported systems, and run the
tests? ...Doesn't that make you into a distributor? (Like, at that
point, why not just stick the QA'ed binaries into an installer and put
it on a web page?)

This also assumes that everyone upgrades everything in lockstep...
maybe I'm weird, but I tend to upgrade either when I upgrade my
distro, or when I need a specific feature. The result is that in my
actual day-to-day python environment right now, I have numpy 1.5.1 and
scipy 0.8.0 (released 2010) coexisting with ipython 0.12 and pandas
0.7.2 (<6 months old). Ironically, the reason I do this is basically
the reproducible research argument -- if I upgraded everything all the
time then I'd probably have version skew breakage to deal with all the
time.

What do you do if numpy makes 1.7 a long-term release, so eventually
we have 1.8.x and 1.7.x releases coming out intermixed?

- N

Stéfan van der Walt

unread,
May 26, 2012, 7:46:38 PM5/26/12
to numf...@googlegroups.com
On Sat, May 26, 2012 at 4:02 AM, Nathaniel Smith <n...@pobox.com> wrote:
>> Also, such a versioned
>> set can very easily be supported by EPD, Python(x,y), Debian, etc.
>> without any additional infrastructure.
>
> This just seems factually incorrect to me. Can you elaborate?

In retrospect, I think you're right. Even if we could get Debian to
tag the versions, it would mean that a user would get a very specific
version of the Sciome, but not any arbitrary version--that defeats the
purpose. Perhaps, then, it is better to invest time in providing a
build system that can combine arbitrary versions of packages together,
so that you could do "build sciome 06cf77e" and have the exact same
setup (version-wise, at least) as another researcher (and this sounds
like what Dag was talking about).

Stéfan

Travis Oliphant

unread,
May 26, 2012, 9:39:11 PM5/26/12
to numf...@googlegroups.com
>> Yes, that gets you a lot of version numbers, but Sage has had bi-weekly
>> releases for some periods, and it still has the benefit that if somebody
>> says "Sage 3.23" I'll be able to know what set of packages that was.
>
> Hmm, interesting idea. One could have a little bot watching PyPI and
> bumping the version number whenever it noticed a new release of a
> relevant package...
>
> How do you know whether a new release breaks other packages, though? I
> guess you build all of them, on all supported systems, and run the
> tests? ...Doesn't that make you into a distributor? (Like, at that
> point, why not just stick the QA'ed binaries into an installer and put
> it on a web page?)

This would be one approach and a hopeful eventual outcome of a serious CI effort. It would be great if NumFOCUS received enough support to provide a back-bone here. But, this will take some significant effort and evangelizing.

>
> This also assumes that everyone upgrades everything in lockstep...
> maybe I'm weird, but I tend to upgrade either when I upgrade my
> distro, or when I need a specific feature. The result is that in my
> actual day-to-day python environment right now, I have numpy 1.5.1 and
> scipy 0.8.0 (released 2010) coexisting with ipython 0.12 and pandas
> 0.7.2 (<6 months old). Ironically, the reason I do this is basically
> the reproducible research argument -- if I upgraded everything all the
> time then I'd probably have version skew breakage to deal with all the
> time.
>
> What do you do if numpy makes 1.7 a long-term release, so eventually
> we have 1.8.x and 1.7.x releases coming out intermixed?
>


One approach is to go the route of "crowd-sourcing" and basically make a version of <meta-package> for every configuration that is actually in use. In other-words, anyone with a testable configuration could run a test-suite --- successful completion of which created a Sciome tag that was uploaded to a central location showing the tested configuration.

In that way, no matter what combination you are using you could give it a meta-package name and others could refer to it.

-Travis






> - N

Ilan Schnell

unread,
May 26, 2012, 10:50:29 PM5/26/12
to numf...@googlegroups.com
I like the name sciome, but the conflict with sciome.com is not
a good thing, and could cause problems in the future. I don't
like SPD or SciPD as much, because the higher goal is more about
a scientific environment and build/deploy ecosystem than a
Python distribution as such.

I don't think that a 3 to 6 month release schedule would be such
a bad thing. This has worked quite well in the past for EPD.
A schedule in which any update to any packages would trigger a
new release would be too frequent, and anything beyond 6 month
would be to infrequent, I think. The ecosystem should always
allow users to specify, say sciome 1.3 + pandas 0.7.3, or even
dump out the complete list of exact package names and versions.

We also have to remember that not all packages in sciome are
Python packages. For example what about HDF5?
Adding a "standard sciome" configuration file (like bento) to
all these packages is therefore not feasable. The approach taken
in EPD to solve this problem was to have "build recipes", which would
then control the build for the particular platform at hand.
A solution in which all packages must have a bento file would be
too inflexible.

Also, I only see use in the sciome meta-package, if tested binary
packages are available for all projects on all platforms, because
as Wes pointed out, building many packages is non-trivial. Improvement
can be made over time to ease the build process, but to the something
useful of the ground (in order to get sciome (or whatever the name may
be) established, you need binary packages, and once you have those,
as Nathaniel mentions, one might as well put them into an installer
on a web page.

Regarding Nathaniel's question about Enthought's internal decision-making
process, we (NumFocus) can not (and should not) dictate them which
versions of which packages to include in EPD, or even expect them to add
a specific package. Even though I'm still (more or less) in charge of
EPD, I have little control over management decisions.


- Ilan

Nathaniel Smith

unread,
May 27, 2012, 1:17:47 PM5/27/12
to numf...@googlegroups.com
Does this do what you want?
pip freeze >packages.txt
pip install -E my-collaborator-env -r packages.txt
Works for almost any python package, sciome-related or not. Does
require a working working Python (from any source) and build
environment.

- N

Chris Kees

unread,
May 28, 2012, 1:13:48 AM5/28/12
to numf...@googlegroups.com
I guess this thread is winding down, but  I'll throw in my two cents:

By meta-package I hope we mean just a set with a name and a version number, where the elements of the set are simply package names + version numbers that the community agrees should be in the meta-package. It's a standard.  If that's the kind of meta-package you all mean, then I really wish you could do what it takes (including waiting until the time is right) to call it SciPy or ScientificPython, and make it a result of a process that will ensure it represents the needs of a broad scientific community. 

I can see the utility of such a standard, but what I need much more than a meta-package are the products of hard work that would be planned in the other threads that Dag suggested (doc standards, distribution tools, build tools, CI,...).

Chris



On Saturday, May 26, 2012 8:55:05 AM UTC-5, Dag Sverre Seljebotn wrote:

Ralf Gommers

unread,
May 28, 2012, 6:10:37 AM5/28/12
to numf...@googlegroups.com

"pip freeze" only works for packages installed through pip - not a reasonable requirement - and also is broken because it doesn't do any checks on the reported version actually being the used one. Apparently I'm using numpy 1.2.1 ....

Ralf

Nathaniel Smith

unread,
May 28, 2012, 6:39:42 AM5/28/12
to numf...@googlegroups.com
Curious. I checked both my debian-installed version of numpy, and a
version I had installed in a virtualenv with plain vanilla 'setup.py
install', and 'pip freeze' reported the correct version in both cases.

I'm pretty sure it uses the standard package metadata stuff defined in
PEP-345 and friends, rather than anything pip-specific. I guess
there's some confusion about how exactly one should find this stuff
which PEP-376 is working on cleaning up, but it does work for me, both
with pip and other tools like yolk.

So I'm not sure how you managed to get that... sounds like a bug somewhere.

- N

Ralf Gommers

unread,
May 28, 2012, 9:51:55 AM5/28/12
to numf...@googlegroups.com

Updated to latest distribute and pip, which improves the output - but it's still wrong. The reason is that pkg_resources.find_distributions() doesn't handle packages that were built in-place correctly. I'll file a bug.

Looking at what's going on in pkg_resources, I think it's safe to assume that coming up ourselves with a simple tool that does what "pip freeze" tries to do for a given set of packages will be far more robust. If you have the package names, you can simply try to import each one and parse its version string/tuple.

Ralf

Vagabond_Aero

unread,
May 28, 2012, 9:57:33 AM5/28/12
to numf...@googlegroups.com
I've been reading this discussion about naming the meta-package, trying to come up with something which, to me, gave a clue of what the purpose of the packaging.  It seems to me we're trying to describe a scientific tool stack, which when I did a search on that phrase, found Travis had used it in the title of his blog:


I really like the idea of including "py" in the name somehow, to keep the python connection foremost in the mind of people searching for this capability.  Something along the lines of STSpy or PySTS, but those may not be explicit enough.  

For some reason, Sciome does not give me an idea of a set of tools to build my projects around.  With the proper advertising, it certainly could.

My 2 cents worth... 

Bruce

--
Co-discoverer of KBO:  

IH-X-694190


Nathaniel Smith

unread,
May 28, 2012, 10:44:43 AM5/28/12
to numf...@googlegroups.com
On Mon, May 28, 2012 at 2:51 PM, Ralf Gommers
Honest question: as a developer, how do I provide a version string
robustly? Right now none of my packages have __version__ variables,
exactly because the package metadata is more standard and -- since
it's generated automatically from setup.py -- more reliable than
something I hacked up quickly.

- N

Ralf Gommers

unread,
May 28, 2012, 12:04:54 PM5/28/12
to numf...@googlegroups.com

You could do worse than copy the way numpy does it. Scipy, pandas, statsmodels and scikits-image all did that (look for write_version_py() in setup.py). That allows you to for example check for dependencies like statsmodels does in its setup.py:

def check_dependency_versions(min_versions):
    """
    Don't let setuptools do this. It's rude.

    Just makes sure it can import the packages and if not, stops the build
    process.
    """
    from distutils.version import StrictVersion
    try:
        from numpy.version import short_version as npversion
    except ImportError:
        raise ImportError("statsmodels requires numpy")
    try:
        from scipy.version import short_version as spversion
    except ImportError:
        raise ImportError("statsmodels requires scipy")
    try:
        from pandas.version import version as pversion
    except:
        raise ImportError("statsmodels requires pandas")

 
Right now none of my packages have __version__ variables,
exactly because the package metadata is more standard and -- since
it's generated automatically from setup.py -- more reliable than
something I hacked up quickly.

Why wouldn't you add a __version__ or version? How would I check the installed version of your packages from within my package/script?

Note that package metadata is not robust as my previous emails showed, and it can't be as robust as a good DIY scheme simply because pip & co solve a much harder problem than we need to solve here (pip doesn't know the names of packages beforehand).

If you want to argue that the numpy version handling can be simplified and that we should standardize versioning better for the scientific python stack, you are probably right though.

Ralf

Nathaniel Smith

unread,
May 28, 2012, 2:19:40 PM5/28/12
to numf...@googlegroups.com
On Mon, May 28, 2012 at 5:04 PM, Ralf Gommers
Unfortunately, distutils is such a piece of fragile, inflexible junk
that all of these solutions have substantial complications and
limitations. You have to auto-generate a checked-in file, or you have
to

>> Right now none of my packages have __version__ variables,
>> exactly because the package metadata is more standard and -- since
>> it's generated automatically from setup.py -- more reliable than
>> something I hacked up quickly.
>
> Why wouldn't you add a __version__ or version? How would I check the
> installed version of your packages from within my package/script?
>
> Note that package metadata is not robust as my previous emails showed, and
> it can't be as robust as a good DIY scheme simply because pip & co solve a
> much harder problem than we need to solve here (pip doesn't know the names
> of packages beforehand).

pkg_resources.get_distribution("scikits.sparse").version does work
right now, modulo bugs, and doesn't try to solve the harder problem
you mention. It also works the same for every package. Maybe we could
all get our act together enough to implement a standard solution for
all the scientific packages, but even if we did, it'd still fall over
as soon as someone wanted to use lxml or django or BeautifulSoup.

But, I thought Stefan actually did want to solve the "harder problem"
-- he wants to be able to reproduce one researcher's environment on
another researcher's machine. That either requires making an
exhaustive list of every package that any researcher anywhere will
ever install, or else doing some pip-style searching. I assume that
sciome -- whatever it might be :-) -- would not include every
astronomy package, climate modelling package, MCMC package, machine
vision package, web scraper, etc. etc.

I did look at the pkg_resources code, and it's a total mess of
spaghetti. But IMHO we're still better off figuring out how to make it
work reliably for our packages (which is probably doable even if it
can't be made reliable in general) than trying to replicate it in our
own walled garden.

I guess my general perspective here is, a huge part of Python's draw
for me is exactly that it lets me use one set of interoperable tools
to do my text file munging, web page serving, experiment presentation,
simulation running, natural language processing, and statistical data
analysis. Obviously Matlab and R get plenty of advantages from their
ability to control their whole environment, and it'd be nice if we
could have those advantages too (and an infix dot product operator
while we're at it). But, we don't -- we're just one piece of a huge
Python ecosystem. If we try to replicate the Matlab/R model, then
we'll always be playing catch-up. OTOH, if we can turn that ecosystem
into a selling point, then they'll be playing catch-up with us.
Virtualenv, say, is a fabulous tool for reproducible research, but it
was invented by web developers. So I think we should try to integrate
with the non-scientific Python stack whenever possible. Obviously the
"whenever possible" part is a big caveat, but for package metadata it
seems doable (if perhaps annoying).

- N

Dag Sverre Seljebotn

unread,
May 28, 2012, 2:45:19 PM5/28/12
to numf...@googlegroups.com
>> it can't be as robust as a good DIY scheme simply because pip& co solve a
"Walled garden" is not intrinsically nice, but neither is trying to get
somewhere with the brakes slammed on. There must be some balance.

I think the feeling is mutual. At PyData, Guido's comment after hearing
10% of our problems in the build&distribution area was "well, perhaps
you just need to go and do your own thing then".

Dag

Nathaniel Smith

unread,
May 28, 2012, 3:08:39 PM5/28/12
to numf...@googlegroups.com
I know there are hard problems here, but I don't see how arranging for
there to be a file called PKG-INFO in or near your installed package
is one of them.

- N

David Cournapeau

unread,
May 28, 2012, 3:19:26 PM5/28/12
to numf...@googlegroups.com
Having a file "near" your installed package is fragile, and more importantly non-explicit. For example, since all those tools have no uninstallation feature, it is easy to have a stale egg-info compared to the actual package. pkg_resource is also dog-slow, and slow down imports significantly.

The solution I have in bento to deal with this is quite simple:
  - the bento.info file contains the version string
  - the build stage populates a simple python file with the version in it. Dealing with non-built packages (for pure-python ones) can be handled by a simple try/except ImportError guard around this generated file.

It is simple, easy to understand and trace back if it fails, and does not depend on a 4000 LOC abomination :)

David

Nathaniel Smith

unread,
May 28, 2012, 3:37:44 PM5/28/12
to numf...@googlegroups.com
That does sound lovely, and I hope you have a strategy for migrating
the Python world to bento, because goodness knows what we have now is
not there yet. OTOH IIUC bento isn't quite ready for that yet.

But all I'm saying here is that if you want a way to dump the package
configuration from a Python install, probably the way to do that is
with a generic Python package management tool, not an ad-hoc script
that can only handle its built-in database of "sciome" packages.

- N

Travis Oliphant

unread,
May 28, 2012, 3:59:43 PM5/28/12
to numf...@googlegroups.com
>>
>>
>> Having a file "near" your installed package is fragile, and more importantly
>> non-explicit. For example, since all those tools have no uninstallation
>> feature, it is easy to have a stale egg-info compared to the actual package.
>> pkg_resource is also dog-slow, and slow down imports significantly.
>>
>> The solution I have in bento to deal with this is quite simple:
>> - the bento.info file contains the version string
>> - the build stage populates a simple python file with the version in it.
>> Dealing with non-built packages (for pure-python ones) can be handled by a
>> simple try/except ImportError guard around this generated file.
>>
>> It is simple, easy to understand and trace back if it fails, and does not
>> depend on a 4000 LOC abomination :)
>
> That does sound lovely, and I hope you have a strategy for migrating
> the Python world to bento, because goodness knows what we have now is
> not there yet. OTOH IIUC bento isn't quite ready for that yet.
>
> But all I'm saying here is that if you want a way to dump the package
> configuration from a Python install, probably the way to do that is
> with a generic Python package management tool, not an ad-hoc script
> that can only handle its built-in database of "sciome" packages.
>

I relabeled this thread....

I'm pretty sure we all appreciate Python's strength at allowing us to sample from an ecosystem of non-scientific packages. Obviously, any system that we adopt in the scientific sphere would have to also support installation of "non-scientific" packages.

However, that strength is actually a weakness when it comes to build and packaging because it has meant that we thought the rest of the Python community would help us out there and it really didn't because most of the rest of the Python community doesn't have our use-case of needing to build and distribute packages with a lot of native code.

Many of us have been hoping that distutils and friends would sort itself out for a long time and David has spent tireless hours discussing things with python-devs who just don't get the use-case and think they have it solved --- but don't. He has come up with some compelling approaches that are a very good start pretty far in the right direction from what I can see.

It is very clear to all of us (especially after Guido basically confirmed it to us at the PyData conference in the Fall) that:

1) We need a better build tool than distutils for scientific packaging. Bento + (waf, cmake, ...) looks like a good start
2) We need better package management tools and a repository for bento-packaged packages.

#1 is really not controversial as it's very simple to convert a setup.py file to use bento (we will still use the same interfaces that the rest of the Python ecosystem uses). I've seen David's system as one who is very familiar with the distutils infrastructure. It's very nice in comparison and just needs man power to start things going.

#2 might seem unnecessary because of the existence of PyPI, but I would maintain that we need to have our own with better metadata and better support for auto-build of binaries for authors who register their packages with bentoPI. In fact, I would argue that bento should provide mechanisms for registering with *both* PyPI and bentoPI by default.

Best,

-Travis




Dag Sverre Seljebotn

unread,
May 28, 2012, 4:05:25 PM5/28/12
to numf...@googlegroups.com
Of course, even that isn't enough for the full picture. Which lapack, suitesparse, OS, C compiler is important as well. Whatever Sciome and the thread topic was about, it was not about this.

BTW, there is a tool 'Sumatra' which approaches reproducible research from this angle.

David Cournapeau

unread,
May 28, 2012, 4:05:19 PM5/28/12
to numf...@googlegroups.com
On Tue, May 29, 2012 at 4:37 AM, Nathaniel Smith <n...@pobox.com> wrote:


But all I'm saying here is that if you want a way to dump the package
configuration from a Python install, probably the way to do that is
with a generic Python package management tool, not an ad-hoc script
that can only handle its built-in database of "sciome" packages.

There is an obvious chicken-and-egg issue here. Finding a good solution is difficult, but mainly from a social POV. I think the problem is to find a local optimum within the following constrains:
  - forget about the existing distutils/distribute/pkg_resource/etc… ecosystem as a basis to build upon. Justification: they are all based on "let's throw all the metadata when we have them, and let's recreate them with magic and guessing when we need them" principle that cannot possible work reliably.
  - interoperate with the existing ecosystem: our solution should be "spreadable", and not prevent any existing workflow. This is why bento allows for a setup.py to exist, and can work within virtualenv, with pip, etc… In those contexts, of course, you don't benefit from the advantages of our hopefully better solution.
  - an ability to "import" an existing deployment into a "sciome" db. I have not thought much about this one.

Also, I don't think we need to aim at converting 1000s of python packages. EPD and python(x, y) are popular, and they support what, around 100 packages max ? Note that a lots of packages are simple enough that the convert method from setup.py to bento.info should work well.

David

Nathaniel Smith

unread,
May 28, 2012, 4:25:22 PM5/28/12
to numf...@googlegroups.com
On Mon, May 28, 2012 at 9:05 PM, David Cournapeau <cour...@gmail.com> wrote:
> On Tue, May 29, 2012 at 4:37 AM, Nathaniel Smith <n...@pobox.com> wrote:
>> But all I'm saying here is that if you want a way to dump the package
>> configuration from a Python install, probably the way to do that is
>> with a generic Python package management tool, not an ad-hoc script
>> that can only handle its built-in database of "sciome" packages.
>
> There is an obvious chicken-and-egg issue here. Finding a good solution is
> difficult, but mainly from a social POV. I think the problem is to find a
> local optimum within the following constrains:
>   - forget about the existing distutils/distribute/pkg_resource/etc…
> ecosystem as a basis to build upon. Justification: they are all based on
> "let's throw all the metadata when we have them, and let's recreate them
> with magic and guessing when we need them" principle that cannot possible
> work reliably.
>
>   - interoperate with the existing ecosystem: our solution should be
> "spreadable", and not prevent any existing workflow. This is why bento
> allows for a setup.py to exist, and can work within virtualenv, with pip,
> etc… In those contexts, of course, you don't benefit from the advantages of
> our hopefully better solution.

The 'bentomaker convert' and setup.py-calling-bento code is great for
letting people migrate to bento-the-build-system. But IMO
bento-the-install-system really needs to also be able to run in a kind
of "legacy mode", where "bento install my-pypi-package" works even if
my-pypi-package uses distutils. It shouldn't be hard, really -- just
run setup.py install with some --install switches to divert the files
to a known directory (like the old "make install DESTDIR=..." trick),
then suck out the egg info and install files, make sure they have the
metadata you want, and do the install. That way if you can throw in a
few killer features (reliable uninstall would be a start, but maybe a
ccache-like mode, for people who are building and tearing down
virtualenvs all day?), then people who don't care at all about
scientific python will be able and willing to switch.

Perhaps you're already planning to do that.

- N

David Cournapeau

unread,
May 28, 2012, 5:01:31 PM5/28/12
to numf...@googlegroups.com
I don't have this feature on purpose: as soon as you run setup.py in part of the usual setup, you're re-importing everything that is wrong with distutils. Doing so would force me to give up on bento advantages to be able to implement "killer" features. Also, any package simple enough such as python setup.py install_into_sandbox could work would also work if first automatically converted to bento format.

My general strategy is to make it easy to convert to bento, have enough features *at the package* level to make it worthwhile to be based on bento instead of distutils ( parallel build, parallel 2to3, easy "make doc", installation of doc), and work within the existing solutions.

I also know frustration with distutils goes beyond scipy: I believe twisted community hates it as much as we do for example. I also think distutils2 will open a window, because it will force people to convert anyway, and I think bento today has more compelling features *at the package level*.

David

Nathaniel Smith

unread,
May 28, 2012, 5:18:46 PM5/28/12
to numf...@googlegroups.com
Really? With current numpy, 'python setup.py install --prefix=...'
works fine, and leaves you with a set of files to be installed plus a
.egg-info, and the .egg-info seems to have plenty of metadata in it.
(Others might not, but this one does.) Why couldn't bento extract the
metadata it needs and then install those files in a managed way? What
bento advantages are lost, assuming a package that would otherwise
just be uninstallable?

(I'm assuming that bentomaker convert could not automatically handle
numpy's setup.py ;-).)

> My general strategy is to make it easy to convert to bento, have enough
> features *at the package* level to make it worthwhile to be based on bento
> instead of distutils ( parallel build, parallel 2to3, easy "make doc",
> installation of doc), and work within the existing solutions.
>
> I also know frustration with distutils goes beyond scipy: I believe twisted
> community hates it as much as we do for example. I also think distutils2
> will open a window, because it will force people to convert anyway, and I
> think bento today has more compelling features *at the package level*.

Yes, but really you have two tools here -- bento-the-build-tool, which
is useful for developers, and bento-the-package-manager, which is
useful for users. Except that it sounds like you're saying it won't be
useful to users until every package that they ever want to install has
switched, which seems like it means "never".

-- Nathaniel

Nathaniel Smith

unread,
May 28, 2012, 5:57:57 PM5/28/12
to numf...@googlegroups.com
...it occurs to me that this probably sounds more negative than I
mean. Mostly I am just disappointed, it's like PyPy -- you're dangling
all kinds of cool advantages, but so long as there are a few packages
that aren't supported, I can't use it at all :-(.

-N

Anthony Scopatz

unread,
May 28, 2012, 6:11:16 PM5/28/12
to numf...@googlegroups.com


On Mon, May 28, 2012 at 10:57 PM, Nathaniel Smith <n...@pobox.com> wrote:

[snip]
 
> Yes, but really you have two tools here -- bento-the-build-tool, which
> is useful for developers, and bento-the-package-manager, which is
> useful for users. Except that it sounds like you're saying it won't be
> useful to users until every package that they ever want to install hash

> switched, which seems like it means "never".

...it occurs to me that this probably sounds more negative than I
mean. Mostly I am just disappointed, it's like PyPy -- you're dangling
all kinds of cool advantages, but so long as there are a few packages
that aren't supported, I can't use it at all :-(.

This seems overly absolute.  The engineer in me argues for usefulness
over purity.  If this means that the package manager ends up using distutils
for older, unsupported, non-converted packages I think that this should 
be an included feature.  We would not be saying that these packages 
*would* work, but we would at least try. 

In my mind this goes along with the bleeding edge package feature, where 
git or hg  repos are checked out from public repos.  Nice and useful to have
though maybe not solely within the core framework.

Also this discussion should probably be moved to the other thread.

Be Well
Anthony
 

-N

Peter Wang

unread,
May 28, 2012, 6:51:50 PM5/28/12
to numf...@googlegroups.com
On Mon, May 28, 2012 at 2:59 PM, Travis Oliphant <teoli...@gmail.com> wrote:
> It is very clear to all of us (especially after Guido basically confirmed it to us at the PyData conference in the Fall) that:

Yes, I'd like to reiterate this. After explaining our predicament to
Guido, he basically told us to go make something better. So, he has
no particular qualms with us taking a non-distutils approach to
packaging for our ecosystem.

> #2 might seem unnecessary because of the existence of PyPI, but I would maintain that we need to have our own with better metadata and better support for auto-build of binaries for authors who register their packages with bentoPI.   In fact, I would argue that bento should provide mechanisms for registering with *both* PyPI and bentoPI by default.

Oh come now - surely the "repository of Bento packages" should be
named The Bento Box?!

-Peter

Anthony Scopatz

unread,
May 28, 2012, 6:55:03 PM5/28/12
to numf...@googlegroups.com
+1!
 

-Peter

Travis Oliphant

unread,
May 28, 2012, 7:09:14 PM5/28/12
to numf...@googlegroups.com
Perhaps we can all agree on at least one name after all :-)

-Travis

David Cournapeau

unread,
May 28, 2012, 9:45:04 PM5/28/12
to numf...@googlegroups.com
I think bentoyasan (弁当屋), i.e. bento shop, is more appropriate :) A bento box usually only include one bento.

David

David Cournapeau

unread,
May 28, 2012, 10:17:17 PM5/28/12
to numf...@googlegroups.com
There are numerous issues with this approach:
  - python setup.py install --prefix works only for distutils packages. setuptools/distribute requires something else
  - you don't know which files are what: what is configuration, what is doc, which are are python packages, etc… egg-info does not contain this information.
  - the big one: there is no enforced metadata. While this is not implemented in bento yet either, the plan is to have a set of enforced metadata if you want to be able to install it.

The approach you are suggesting looks very attractive at first, but it is also what makes setuptools/easy_install unreliable in my opinion.


> My general strategy is to make it easy to convert to bento, have enough
> features *at the package* level to make it worthwhile to be based on bento
> instead of distutils ( parallel build, parallel 2to3, easy "make doc",
> installation of doc), and work within the existing solutions.
>
> I also know frustration with distutils goes beyond scipy: I believe twisted
> community hates it as much as we do for example. I also think distutils2
> will open a window, because it will force people to convert anyway, and I
> think bento today has more compelling features *at the package level*.

Yes, but really you have two tools here -- bento-the-build-tool, which
is useful for developers, and bento-the-package-manager, which is
useful for users. Except that it sounds like you're saying it won't be
useful to users until every package that they ever want to install has
switched, which seems like it means "never".

While you are right there are two things here, they are linked together. What makes bento-the-package-manager easy/reliable is that it can use bento-the-build-tool that is reliable.

I also think that if we can start "small" with say a distribution of a few tens of packages, we can get pretty far (using real-life example of EPD/python(x,y)). The difference with pypy is that you can develop a bento package without losing any ability to run on python, and you don't need to rewrite much in most cases. This allows coexistence, without importing all distutils/etc… flaws into a new solution.

One thing that would be interesting is to see how many packages can be converted automatically to bento and still pass their test suite. I could certainly envision a service hook up to github and pypi that would allow to do this.

David

Ilan Schnell

unread,
May 29, 2012, 1:33:30 AM5/29/12
to numf...@googlegroups.com
I like bento shop, because of the analogy with cheese shop.

- Ilan

Didrik Pinte

unread,
May 29, 2012, 3:41:20 AM5/29/12
to numf...@googlegroups.com
On 26 May 2012 03:01, Stéfan van der Walt <ste...@sun.ac.za> wrote:
> On Fri, May 25, 2012 at 4:55 PM, Nathaniel Smith <n...@pobox.com> wrote:
>> I don't see how any solution which requires that people use a fixed
>> set of package versions is going to get off the ground.
>
> I don't think you should underestimate the value of such a set.  Being
> able to tell someone that they can reproduce your search / run your
> app using, e.g., Sciome 12, is much more appealing than guiding them
> through the combination of subpackages needed.  Also, such a versioned
> set can very easily be supported by EPD, Python(x,y), Debian, etc.
> without any additional infrastructure.
>
> In fact, I would argue that, at least in the world of science,
> reproducible research *demands* a versioned package approach that
> we've been lacking for a long time.
>
> The first obvious problem is that sub-package releases may not be in
> sync with Sciome / whatever.  But that's exactly why we have this
> conversation, and why those packages need to be on board.

I've been thinking at this meta-package idea and the potential ways of
leveraging the huge amount of work related to any Python distribution
seen so far (Debian, EPD or others). Getting a clear version sets
related to a meta-package would be very easy to add into EPD (at least
looking at the initial list of packages).

I think we (Enthought) could potentially keep and maintain binary
installers under the Academic license with the content of this
meta-package. We could keep the history of installer for years with a
clear naming system that publications could reference to get direct
access to the needed environment. The idea of AMI for EC2 would be
very easy to do too. This type of effort would be in perfect line with
our view on supporting the academic/research needs.

-- Didrik

Nathaniel Smith

unread,
May 29, 2012, 4:45:02 AM5/29/12
to numf...@googlegroups.com
Not sure what you mean -- I just ran

python -c 'import setuptools; setuptools.setup()' install --help

and I got the exact same (long and confusing) list of installation
target options that I get from distutils. (--prefix, --root,
--install-{lib,headers,scripts,data}, etc.)

>   - you don't know which files are what: what is configuration, what is doc,
> which are are python packages, etc… egg-info does not contain this
> information.

Fortunately, setuptools/distutils make it almost impossible to handle
configuration and doc in any reasonable way, so this probably doesn't
come up much... and worst case they end up mixed in with the package
source, like they do now. Sub-optimal, but it seems like you'd still
get all the other advantages of bento-the-package-manager: fast and
reliable access to metadata, reliable uninstall, etc. It should even
be possible to turn these things into binary installers; I get the
impression that supporting binary install is another goal of bento?

>   - the big one: there is no enforced metadata. While this is not
> implemented in bento yet either, the plan is to have a set of enforced
> metadata if you want to be able to install it.

Right, which is why I keep saying that you can reject packages that
are missing critical metadata :-). This is why I'm *not* suggesting
that you just let setup.py do the installation, but instead divert it
to a temporary directory, so you can still control the actual
deployment.

Most packages do, in practice, have reasonable metadata, and for the
ones that don't, that's a legitimate bug that the authors will
probably be open to fixing, even if they don't care about bento. So
this strategy lets the bento-installer ecosystem grow from a tiny
handful of installable packages to basically all of the important
ones, overnight. 'pip' isn't 100% successful at installing packages
either, but 95% of the time it just works, so people use it.

> The approach you are suggesting looks very attractive at first, but it is
> also what makes setuptools/easy_install unreliable in my opinion.
>
>
>> > My general strategy is to make it easy to convert to bento, have enough
>> > features *at the package* level to make it worthwhile to be based on
>> > bento
>> > instead of distutils ( parallel build, parallel 2to3, easy "make doc",
>> > installation of doc), and work within the existing solutions.
>> >
>> > I also know frustration with distutils goes beyond scipy: I believe
>> > twisted
>> > community hates it as much as we do for example. I also think distutils2
>> > will open a window, because it will force people to convert anyway, and
>> > I
>> > think bento today has more compelling features *at the package level*.
>>
>> Yes, but really you have two tools here -- bento-the-build-tool, which
>> is useful for developers, and bento-the-package-manager, which is
>> useful for users. Except that it sounds like you're saying it won't be
>> useful to users until every package that they ever want to install has
>> switched, which seems like it means "never".
>
> While you are right there are two things here, they are linked together.
> What makes bento-the-package-manager easy/reliable is that it can use
> bento-the-build-tool that is reliable.

It might be I've misunderstood, because none of the advantages I've
seen described for bento-the-package-manager seemed to care how
packages are built (apt-get doesn't care either; even dpkg-build
doesn't really care). Certainly they'll work better together, but
that's different...

> I also think that if we can start "small" with say a distribution of a few
> tens of packages, we can get pretty far (using real-life example of
> EPD/python(x,y)). The difference with pypy is that you can develop a bento
> package without losing any ability to run on python, and you don't need to
> rewrite much in most cases. This allows coexistence, without importing all
> distutils/etc… flaws into a new solution.

EPD/python(x,y) are awesome, but they're pretty niche solutions within
the broader python ecosystem. Certainly they're no use to me
personally -- I'm going to keep using debian's python :-). If you want
bento to be a niche tool that coexists with pip etc. indefinitely,
then I certainly can't stop you. But you should think bigger!
Attracting a broad user-base takes hard work and some slogging through
the mud, but the marginal effort required seems much lower than the
costs from fragmentation. (And of course, that's easy to say when I'm
not doing the work ;-).)

-N

Dag Sverre Seljebotn

unread,
May 29, 2012, 7:05:05 AM5/29/12
to numf...@googlegroups.com
If you want to use Debian, wouldn't you want to use .deb's for your
installation needs? IIRC, one of Bento's goals was to make it easier to
create repositories with Debian packages, Ubuntu packages, etc.; so that
"The Bento Shop" would simply provide package repositories for the
various Linux distributions as one of the distribution mechanisms.

(But that information may be out of date...David?)

Dag

Nathaniel Smith

unread,
May 29, 2012, 8:10:48 AM5/29/12
to numf...@googlegroups.com
I don't use Debian because I like .deb's, I use Debian because I like
their QA and integration :-). I don't want 'apt-get upgrade' to be
upgrading all my Python packages to the bleeding edge, and I
definitely don't want it doing that globally, or for pre-release
versions, etc. (And I suspect the sysadmin in our lab has an even
stronger opinion about this.) Plus there's the standard set of
problems that third-party deb repositories are prone to: random
non-debian specialists can't be trusted to properly handle version
numbering, package name transitions, etc., which all have to be kept
in some kind of sync between the third-party repo and the distro repo,
or else you end up breaking release upgrades, or third-party<->distro
side-grades, etc.

So what I do is use Debian's python packages as a stable baseline, and
then use a virtualenv+pip on top to add in specific cutting-edge
packages on a case-by-case basis. That way apt-get upgrade can still
be trusted to bring me bugfixes on a regular schedule, I only have to
keep track of the packages that I've specifically taken responsibility
for, and if I really mess things up I can always blow the virtualenv
away and start over. For my use cases, debian+virtualenv+pip (for all
its limitations) is still superior to a debian+third-party-.debs
approach would be.

But anyway, this is sort of a tangent -- really my point is that I
would be happy to switch to debian+virtualenv+bento, but only if I can
be reasonably confident that I'll be able to install arbitrary Python
packages as I discover I need them.

- N
Reply all
Reply to author
Forward
0 new messages