One ring to rule them all

瀏覽次數:0 次
跳到第一則未讀訊息

Skip Montanaro

未讀,
2015年11月20日 上午9:05:512015/11/20
收件者:anac...@continuum.io
On Thu, Nov 19, 2015 at 7:01 PM, Chris Barker <chris....@noaa.gov> wrote:
> I'm lost -- didn't this all start because you wanted people to be able to
> install packages without admin privileges???

Yes, and the lack of clarity is my fault. This is one of those things
where a five minute conversation would have laid out the entire
situation. Instead, I've thrown out bits and pieces of the problem.
So, I'll start a new thread, as PYTHONPATH issues are just one piece
of the puzzle.

Here's where we sit today. I happen to work at a trading firm, but I
doubt that changes things much. We have production applications
written in Python, and a number of quants use Python for their model
research. I'm sure other organizations have a similar setup, where
production tools use one subset of the Python ecosystem and
non-production users use an overlapping set of packages. The
environment I work in is mostly Linux (OpenSuSE 12.2, someday to move
to a higher number).

We used to be a Solaris shop, and in the multi-year transition period
to Linux, it was decided that maintenance for Python and a number of
other open source packages (Perl, Emacs, etc) would be outsourced to a
third party company. At that point we had three versions of Python:

* /usr/bin/python (both Linux and Solaris, 2.7.3 by that point on Linux)

* The third-party vendor's supplied Python (2.7.2)

* And on Solaris, a locally built Python 2.4

For the most part, the third-party vendor's Python took over. The IT
folks wrapped all their C++ stuff with Boost::Python so those
libraries were available to production applications. However, all was
not well. Let's ignore Solaris now, as it's all but out the door.
/usr/bin/python and the external Python were (for reasons not clear to
me) built with different internal Unicode representations, making them
binary incompatible. (It was OpenSuSE who departed from the default.)
That meant that if someone needed a package available as part of
OpenSuSE (say, something like scikit-learn), the OpenSuSE version
wouldn't work. We had to build and package it against the external
vendor's Python ourselves for internal distribution. This was
especially problematic for our researchers, as they were the ones who
needed the most packages which were a deviation from what our
production Python apps needed. And the sysadmins failed to understand
the incompatibility between the two versions of Python. They'd install
a package using Zypper, and close the ticket. *sigh*

The quants know about pip as well, so they might find something
interesting, install it using "pip install --user", then go on their
way. If they ran into problems though, and asked for help, they
wouldn't necessarily get any, as the IT folks had no idea what their
environment was. Then there was the IPython notebook problem. A quant
might be using Anaconda or Enthought's distribution, generate a
notebook, then not be able to share it with management, because they
were using the more vanilla (and further out-of-date) packages from
the external vendor.

In the future, with Solaris gone, and us updated to a newer version of
OpenSuSE, this will be less of a problem, but I imagine the quants
will always want something a bit more up-to-date than what the stock
distribution supplies. There may well be other packages which aren't
available from the vendor at all.

Which brings me to Anaconda. Note that if each user installs Anaconda
at different times, the situation I described above still exists, as
IT won't know how Fred's Anaconda install is different from Mary's.

This is how I envision things might improve with a single Anaconda environment:

* Someone like me downloads and installs it. If there are packages
which need installing, then I conda install or pip install them. Then
I wrap the entire ~/miniconda business up into an internal package,
and make it available for installation on all research machines.

* One of the quants wants some new package, X, which isn't available
in the package I built, so he installs it locally using pip install
--user (or something similar using the conda command, if it supports
per user installation). Then he opens a help ticket asking for X in
the Anaconda package.

* I get handed the ticket, install X using conda install, repackage
and release. That evening, the new package is installed on all
research machines automatically.

The time between the second and third steps might be a couple days,
depending how busy I am. Once repackaged though, the quant can
uninstall his version of X, perhaps by discovering that his version of
X is now shadowing the version in the Anaconda package (back to the
tool I proposed). If he encounters problems, he can open a help ticket
which references the Anaconda install, and the IT folks will know just
what he's using. Today, they have no idea.

I hope that helps explain why I'm trying to have my cake and eat it too.

Skip Montanaro

Chris Barker

未讀,
2015年11月20日 中午12:41:332015/11/20
收件者:anaconda
On Fri, Nov 20, 2015 at 6:05 AM, Skip Montanaro <skip.mo...@gmail.com> wrote:
Here's where we sit today. I happen to work at a trading firm, but I
doubt that changes things much.

Indeed -- I work for a group in the federla government -- similar issue, I'm sure!

We have production applications
written in Python, and a number of quants use Python for their model
research. I'm sure other organizations have a similar setup, where
production tools use one subset of the Python ecosystem and
non-production users use an overlapping set of packages.

yup -- there is where the challenges lie.
 
And, indeed, this is exactly the situation that virtual environment was supposed to help solve -- but it doesn't support non-python dependencies well -- hence the conda. And conda was designed to support this use-case -- maybe not in an ideal way, but pretty close.

third party company. At that point we had three versions of Python:

* /usr/bin/python (both Linux and Solaris, 2.7.3 by that point on Linux)

* The third-party vendor's supplied Python (2.7.2)

* And on Solaris, a locally built Python 2.4

this is exactly why conda provided python itself, also -- you really need to be abel to standardize that across OS version, etc, as well. Also why I personally have always advocated NOT using the sytem python -- IT policy folks really want o you to use the system version of things as much as possible , but it's really just a recipe for a pain in the *^%&.
 
For the most part, the third-party vendor's Python took over. The IT
folks wrapped all their C++ stuff with Boost::Python so those
libraries were available to production applications. However, all was
not well.

yup -- you really are looking for conda now!
 
Which brings me to Anaconda. Note that if each user installs Anaconda
at different times, the situation I described above still exists, as
IT won't know how Fred's Anaconda install is different from Mary's.

This is how I envision things might improve with a single Anaconda environment:

I think you may approaching this from the wrong direction. If you want everyone to have exactly the same setup, then yes, a single centrally managed Anaconda install is the way to go.

But I suspect that's not possible:

- production applications will need to have a pre-defined and unchanging set of packages -- none of them should ever be upgraded without a deliberate decision and testing, etc.

 - Developers of the next version of production apps will need newer versions of packages than the in-production versions.

 - Your 'quants' (or in my group, scientists) are doing experimental work, and need to be able to try the latest and greatest version of packages.

But this is EXACTLY the use case that conda environments are made for.

each production app has a particular environment spec that it is tested on -- to deploy, you build an environment from an environment.yml file:


That way, you get exactly the environment the app was tested on -- and it can be easily reproduced 

Each in-development app has an environment.yml file that is maintained by the developers (in src control)-- it is updated as new of upgraded packages are required and tested. when it's time for release -- you've got a way to build a compatible environment. 

Then there are the experimenters -- these folks are going to have an environment in which they are installing willy-nilly extra stuff, upgraded stuff, etc. They may not need to use environments -- personally, I just stick with one many conda install and update as I see fit it's my problem if I break something I was working on six months ago...

Which bring su to folks sharing notebooks around -- for that, I'd say you either:

  - have a standardized environment for the quants -- put the enviroment.yml file in a central repo somewhere, and then when someone shares out a notebook, they can say -- this need environment version 1.2.3 (or maybe master) -- and away they go.


However, with all this there is still a need for some custom-to-your organization package mangement - -you have your custom C++ / Boost code, and you probably need some packages that are not in the standard conda channel.

For that, you'll want to maintain your own custom channel of conda packages -- you can simply host them on anaconda.org, or, if you don't want to share out your custom code (I image you don't), you can put them on an internal server somewhere.

then your users do a simple:

install miniconda
add your channel:

either build an environment for the project at hand or simply install everything they need into their master conda:

conda install --file our_conda_packages.txt

(and this could be a simple script)

The trick is to make sure that they only add that channel to your package.

I'm doing this for our stuff, and it works pretty well. I've put the conda channel up on anaconda.org (and the build recipes on gitHub), because everything we do is open-source, but you can internally mange it instead.

In short, what this means is that you are no centrally managing a whole conda install, but rather, centrally managing a channel of conda packages.

And this keeps you from needing any sort of "user" vs "central" installed packages. Of course, it means everyone gets copies of everything but disk space is cheap, yes?

* Someone like me downloads and installs it. If there are packages
which need installing, then I conda install or pip install them. Then
I wrap the entire ~/miniconda business up into an internal package,
and make it available for installation on all research machines.

So in this case, what you'd do is add that package to your channel -- and maybe update the "standard" environment.yaml file.


* One of the quants wants some new package, X, which isn't available
in the package I built, so he installs it locally using pip install
--user (or something similar using the conda command, if it supports
per user installation). Then he opens a help ticket asking for X in
the Anaconda package.

that would work the same way.
 
* I get handed the ticket, install X using conda install, repackage
and release. That evening, the new package is installed on all
research machines automatically.

if you want it automagic, then you might have to have a chron job on each machine that pulls the latest standard_requirements.yml file and does the update. But while that would be great for new packages, it would probably be a bad idea for upgraded packages -- people won't like having packages upgraded underneath them.
 
. If he encounters problems, he can open a help ticket
which references the Anaconda install, and the IT folks will know just
what he's using. Today, they have no idea.

you can always have the user do a:

conda env export > environment.yml

And you'll know what they have (I think this captured pip-installed packages as well -- but not totally sure about that.). Also -- the help desk can then re-create that exact environment to debug the issue.

I hope that helps explain why I'm trying to have my cake and eat it too.

yup -- all reasonable. And I wrote all this up to clarify my thoughts. So far, I've been doing all this primarily for external users -- but I may do a bit more and make it easier for our scientists and developers as well.

For reference: here are my instructions to our users of one of our messy packages:


and here is our channel:


and here are the conda recipes:


note that most of those packages are simply conda builds of common python packages that could be pip installed -- but I found that pip and conda don't play all that well together when there are inter-dependencies. And it's easy to build a conda package from a pip-installable package anyway.

We're not using this for Linux yet -- but probably will soon -- if conda itself works OK on your Linux versions, then this should be no problem.

Also -- there are some nifty tools to help maintain those channels -- I'm using obvious-ci:


really nice!

There is also the conda-forge project:


which is seeking to create a set of community maintained packages -- something in between Continuum doing it all and each of us doing everything ourselves.

also -- if you have production scripts you pass around, this might be useful:


NOTE: all of that is due to Phil Elson's Awesome work -- thanks Phil!


Plug for conda-forge: it was born of discussion at SCiPy2015 among a handful of folks that had been maintaining packages for their particular communities: the met-ocean community, astropy, etc. But we figured there are in fact a lot of overlapping packages -- so a lot of duplicated effort. Some maybe you'll come join us to help with the financial analysis-oriented packages.

HTH,

-Chris



--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris....@noaa.gov

Pete Jemian

未讀,
2015年11月20日 下午1:03:152015/11/20
收件者:anac...@continuum.io


On 11/20/2015 11:40 AM, Chris Barker wrote:

> always advocated NOT using the sytem python -- IT policy folks really

> want o you to use the system version


Our IT staff here at the APS (another .gov) have worked out an agreement
on Python:

* /usr/bin/python has a specific version needed by the OS
* /our/local/Anaconda/x86_64/bin/python is maintained by
the scientists for the scientists, we add packages to this
on demand from our scientists

We now realize python's local package management still evolves:
* pip install package --user is the old way of doing thi8ngs
and will trigger long discussions with no easy solution
*when* the user wants to pick from several different
available pythons
* virtual environments, such as from conda or python,
are *the* way to manage local package requirements today.
This allows installation of packages
possibly conflicting with the general package suite.
The user can install in the local virtual environment
at their own discretion.

So far, this way of work has minimized the conflicts.

Pete Jemian

未讀,
2015年11月20日 下午1:11:582015/11/20
收件者:Anaconda - Public
You may need to examine the  --user  installed files from ~/.local/lib/python2.7 (or ...).
Instead of removing the directory, suggest to rename that directory so it is not found by a matching pattern and placed into sys.path as python starts.

Chris Barker - NOAA Federal

未讀,
2015年11月23日 上午11:18:092015/11/23
收件者:anac...@continuum.io
> > always advocated NOT using the sytem python -- IT policy folks really want you to use the system version

> Our IT staff here at the APS (another .gov) have worked out an agreement on Python:
>
> * /usr/bin/python has a specific version needed by the OS
> * /our/local/Anaconda/x86_64/bin/python is maintained by
> the scientists for the scientists, we add packages to this
> on demand from our scientists

Sounds like a good system. We more or less do that -- but it gets
tricky when we need to deploy a web service...

> * virtual environments, such as from conda or python,
> are *the* way to manage local package requirements today.

Yup -- and DON'T set PYTHONPATH!

:-)

-CHB

Pete Jemian

未讀,
2015年11月23日 中午12:14:432015/11/23
收件者:anac...@continuum.io
An interesting use case:

On 11/23/2015 10:18 AM, Chris Barker - NOAA Federal wrote:
> Sounds like a good system. We more or less do that -- but it gets
> tricky when we need to deploy a web service...

For this, our IT group manages the www server(s). They install their
own version of Python with agreed package suite (again not
/usr/bin/python) and use this with the web service.

/it/group/Anaconda/x86_64/bin/python

This is consistent with the above model but allows special handling for
this use case.
回覆所有人
回覆作者
轉寄
0 則新訊息