Portable Conda Environments

1 view
Skip to first unread message

Corwin Brown

unread,
May 11, 2015, 1:30:14 PM5/11/15
to co...@continuum.io
Hiya! If this is the wrong group for this question, please let me know and I'll move right along!

At work I've been tasked with creating a fully self-contained Python package. With pip/virtualenv it was reasonably painless to create a fairly lightweight portable environment I could then package up with my application. However, the bulk of the company primarily uses Python for analytics and relies heavily upon Conda.

In attempting to use the same technique with Conda, I've run into some roadblocks. Conda by default uses hardlinks to copy it's flavor of Python into an environment, which is a non-starter. I can set the '--copy' flag, but it copies over the entire Python stdlib, which leads to an environment that is portable, but bloated.

Virtualenv handles this by symlinking only a few necessary libraries from the system Python install, then dropping in a custom site.py file that allows the environment to drop out to the system install for stdlib stuff. Does Conda have anything set up to do something similar? I've poked through it's source code, and haven't found anything, but I figured I'd ask you fine professionals before I put together something gross and hacky, although I suspect I'm faced with a problem Conda was never intending to solve.

TL;DR

Is there a way I can use Conda to create a fairly lightweight and portable environment? Preferably without the bloat of copying over the entire Python stdlib?

Thanks all!

Aaron Meurer

unread,
May 11, 2015, 4:41:59 PM5/11/15
to Corwin Brown, conda
You can specify an environment as an environment.yml using conda-env
(https://github.com/conda/conda-env).

Why are hard-links a non-starter? I guess I'm misunderstanding exactly
what you are trying to do.

Note that conda environments in general are not relocatable, that is,
once you install a package to an environment, there may be files in
that package that have the path to the environment coded into them.
You can check each package's info/has_prefix file to see which files
have hard-coded paths (typically this is just shebang lines in entry
point scripts).

Aaron Meurer
> --
> You received this message because you are subscribed to the Google Groups
> "conda - Public" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to conda+un...@continuum.io.
> To post to this group, send email to co...@continuum.io.
> Visit this group at http://groups.google.com/a/continuum.io/group/conda/.

Corwin Brown

unread,
May 11, 2015, 5:03:20 PM5/11/15
to co...@continuum.io, blak...@gmail.com
Sorry Aaron, I responded in my e-mail, and it didn't show up on the google-group. So I'm posting it again -- Sorry for the spam!!!

---

Hardlinks are a non-starter because I need that environment to be relocatable, and since hardlinks point directly to an inode it's dependent upon that particular filesystem. So (please correct me if I'm wrong!) it would be impossible to plop it down somewhere else.

So far the only hardcoded paths I've found are in the shebang lines in in conda_env/bin, and those are easy enough to replace (Just read that first line, if it matches a regex, replace it with that apps new path)

The idea is I have a developer write some code, I now need to deploy that code to, potentially, hundreds of servers in a reliable and repeatable fashion. Ideally this would be done with as little post processing as possible (I'm trying to avoid having to manually install dependencies hundreds of times at deploy time instead of once at "build" time). I also have to work around that my production machines do not have an external internet connection, so if they need to install a bunch of packages each time, I now have to run a Conda repo in each datacenter, which in and of itself isn't tooooo bad, but I'd rather avoid it if I can. There's also the matter of compile time stuff. I think Conda mitigates this for the most part, but I need to have pre-compiled bits. I don't want someone to be able to run gcc on a production server for example.

TL;DR

I need to easily deploy a bunch of scripts/apps written using Conda environments out to production machines, that do not have access to compilers or the external internet. Ideally, I would create a deb package/zip file that contains everything that script needs to run, then just extract that on the target machine.

Aaron Meurer

unread,
May 11, 2015, 5:19:14 PM5/11/15
to Corwin Brown, conda
Copying a hard-linked installed package to another filesystem will
automatically copy it. Remember that literally every file on your file
system is a "hard link".

I am confused if you don't want to include the standard library where
you want it to come from.

I think the simplest way to do what you want is to use something like
conda-env, or alternately, if you don't want to download the packages
at install time, you can tar up a bunch of conda packages into a
single tarball and use conda to install that.

Aaron Meurer

Chris Barker

unread,
May 12, 2015, 12:05:06 PM5/12/15
to conda
On Mon, May 11, 2015 at 2:18 PM, Aaron Meurer <aaron....@continuum.io> wrote:
I am confused if you don't want to include the standard library where
you want it to come from.

I suppose it might make some sense to have the standard library be one that is pre-installed on the system (and shared among many systems?)

But while it may seem wasteful, it's really not all that large one you compare it to things like numpy and scipy...


if you don't want to download the packages
at install time, you can tar up a bunch of conda packages into a
single tarball and use conda to install that.

that sounds like the way to go. But:

it seems you don't really want a python environment, what you want is an application, that happens to be written in python, with a bunch of dependencies. IN which case, maybe the way to go is PyInstaller or cx-freeze -- these bundle up python and all the packages you need for a application so they can be installed as a single deliverable.

-Chris




--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris....@noaa.gov

Jason Moore

unread,
May 12, 2015, 12:50:17 PM5/12/15
to Chris Barker, conda
Might be worth checking out hashdist for you use case: https://hashdist.github.io/

Corwin Brown

unread,
May 12, 2015, 1:00:49 PM5/12/15
to co...@continuum.io
it seems you don't really want a python environment, what you want is an application, that happens to be written in python, with a bunch of dependencies.

This really nails exactly what I'm trying to do. It's a little weird finding data for Conda for more enterprisey/production application usage as opposed to scientific usage. My use cases are super not the target demographic.

I suppose it might make some sense to have the standard library be one that is pre-installed on the system (and shared among many systems?)

That was my thinking. That is what virtualenv does (which is the side of Python I'm much more familiar with, so I think I'm trying to push a square through a circle hole because it's what I'm used to). But yeah, in my testing I'm finding myself only saving 20-30MB removing the stdlib, then Numpy eats my lunch. I'm starting to test PyInstaller or cx-freeze a little more, but it presents some cross platform challenges, whereas a self contained tarball/zip file I can just plop down and run. I'm also concerned that both seem to have flakey development cycles. It looks like the last tagged release of PyInstaller was 2 years ago.

It looks to me like I'm just going to have to deal with the larger packages for the moment.

Thanks so much for the response!!!

Corwin Brown

unread,
May 12, 2015, 1:01:56 PM5/12/15
to co...@continuum.io, chris....@noaa.gov
I hadn't heard of Hashdist! That actually looks very promising! Thank you for bringing it up! And hey, looks like it was developed down the street from my office!
Reply all
Reply to author
Forward
0 new messages