Distributing a program which links to anaconda's python

0 views
Skip to first unread message

kasper....@gmail.com

unread,
Jul 28, 2018, 4:41:28 PM7/28/18
to Anaconda - Public
I need some insider information concerning best practise when it comes to distributing a complex software package written in C++ which links to Anaconda's python (https://cadabra.science in case you are interested, I'll summarise below though).

The software is multi-platform (linux/macos/windows), and consists of a python module written in C++ (using pybind11) together with a graphical user interface which uses gtkmm. It also links to a whole slew of other libraries which are standard on most Linux platforms. On windows I get these other libraries from the vcpkg distribution, on macos I use homebrew.

Most of my windows users expect the program to work with an existing installation of Anaconda's python. So I would like to make an installer which has a program binary and python module which link to Anaconda's python. At build time, this all works fine. However, if the user has not ticked the 'Make Anaconda my default Python install' (the first option of the Anaconda installer, which leads to the scary red warning), then my program will not find python36.dll at start, and will refuse to work.

What is the recommended way of distributing a complicated product like this? I cannot really build everything against Conda packages inside the Conda ecosystem, as many of the libraries which I use are not in there, so I would anyway rely on other sources (vcpkg or homebrew). Any suggestions?

Thanks,
Kasper

1.0.0.19

Juan E. Sanchez

unread,
Jul 28, 2018, 5:04:21 PM7/28/18
to anac...@continuum.io
Hello,

Sorry for top posting.

I have a C++ program which is an executable with either an embedded
Python 2.7 or Python 3.6 interpreter. My module is embedded into this exe.

https://github.com/devsim/devsim

All of the dependencies are linked in statically, except for the mkl and
python runtimes from anaconda. The release package does not contain
Anaconda. The user downloads it on their own.

On windows, the program is started from the Anaconda prompt. The user
activates their 2.7 or 3.6 environment before starting the appropriate exe.

The anaconda environment sets the correct path automatically for loading
the python and mkl runtimes. This also works on Linux and Mac OS X.

Because of some "features" of Mac OS X, the appropriate environment is
activated and DYLD_FALLBACK_LIBRARY_PATH must be set to find all of the
dependendent libraries. This would not be necessary if I either used:

1. The install_name_tool to fix up the paths to the needed anaconda
libraries.
2. I converted my complicated C++ program into a python module loaded as
shared libraries from the Anaconda python executables.

I hope this helps. Please feel free to comtact me if you need any
clarifications of my method.

Regards,

Juan



On 7/28/18 3:16 PM, kasper....@gmail.com wrote:
> I need some insider information concerning best practise when it comes
> to distributing a complex software package written in C++ which links to
> Anaconda's python (https://cadabra.science in case you are interested,
> I'll summarise below though).
>
> The software is multi-platform (linux/macos/windows), and consists of a
> python module written in C++ (using pybind11) together with a graphical
> user interface which uses gtkmm. It also links to a whole slew of other
> libraries which are standard on most Linux platforms. On windows I get
> these other libraries from the vcpkg distribution, on macos I use homebrew.
>
> Most of my windows users expect the program to work with an existing
> installation of Anaconda's python. So I would like to make an installer
> which has a program binary and python module which link to Anaconda's
> python. At build time, this all works fine. However, if the user has not
> ticked the 'Make Anaconda my default Python install' (the first option
> of the Anaconda installe|r|, which leads to the scary red warning), then
> my program will not find python36.dll at start, and will refuse to work.
>
> What is the recommended way of distributing a complicated product like
> this? I cannot really build everything against Conda packages inside the
> Conda ecosystem, as many of the libraries which I use are not in there,
> so I would anyway rely on other sources (vcpkg or homebrew). Any
> suggestions?
>
> Thanks,
> Kasper
>
> 1.0.0.19
>
> --
> Community Discussion Forum for Anaconda
> ---
> You received this message because you are subscribed to the Google
> Groups "Anaconda - Public" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to anaconda+u...@continuum.io
> <mailto:anaconda+u...@continuum.io>.
> To post to this group, send email to anac...@continuum.io
> <mailto:anac...@continuum.io>.
> Visit this group at
> https://groups.google.com/a/continuum.io/group/anaconda/.

Ray Donnelly

unread,
Jul 28, 2018, 7:14:33 PM7/28/18
to Anaconda - Public
We have no experience with vcpkg, it seems an xkcd link is appropriate though. We build everything from source very carefully so that can assure compatibility.

I recommend you just make a conda package for your Python extension module. Conda-build should deal with all the tricky DSO issues you face. You need to put the right things in the right place though, are you using distutils or setuputils or pip?

Or are you embedding Python? In that case you need to make sure your executable is in the same folder as this DLL (or in PATH but that's less reliable), but that's just completely standard stuff on Windows and nothing to do with the Anaconda Distribution.

Forge DYLD on macOS, Apple have removed these variables with sip so using those is no fix for anything.


--
Community Discussion Forum for Anaconda
---
You received this message because you are subscribed to the Google Groups "Anaconda - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anaconda+u...@continuum.io.
To post to this group, send email to anac...@continuum.io.

Juan E. Sanchez

unread,
Jul 28, 2018, 9:25:15 PM7/28/18
to anac...@continuum.io
> Or are you embedding Python? In that case you need to make sure your
> executable is in the same folder as this DLL (or in PATH but that's less
> reliable), but that's just completely standard stuff on Windows and
> nothing to do with the Anaconda Distribution.

If you start your application from the Anaconda Prompt, and start the
desired environment, you do not need to copy your executable around on
Windows.

>
> Forge DYLD on macOS, Apple have removed these variables with sip so
> using those is no fix for anything.
>

The DYLD variables still work on Mac OS X High Sierra. The shared
libraries I link against from Anaconda are all loaded properly after I
set DYLD_FALLBACK_LIBRARY_PATH

so these libraries (listed on binary from otool -L):

@rpath/libpython3.6m.dylib (compatibility version 3.6.0, current version
3.6.0)
@rpath/libmkl_rt.dylib (compatibility version 0.0.0, current version 0.0.0)
@rpath/libsqlite3.0.dylib (compatibility version 9.0.0, current version
9.6.0)
@rpath/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)
@rpath/libquadmath.0.dylib (compatibility version 1.0.0, current version
1.0.0)

are found.

This is for a binary executable with a embedded python interpreter.

Because of SIP there are applications that will forget the DYLD
variables when calling child processes. For example, the
DYLD_FALLBACK_LIBRARY_PATH is forgotten if you run lldb, and then you
have to set the variable after you have started it.

Arguably, it is better to set the RPATH on the executable. According to
this:

https://stackoverflow.com/questions/33991581/install-name-tool-to-update-a-executable-to-search-for-dylib-in-mac-os-x

I can avoid DYLD_FALLBACK_LIBRARY_PATH by using the install_name_tool.

install_name_tool -add_rpath "${HOME}/anaconda/envs/python3/lib"
foo/bin/devsim_py3

Regards,

Juan
> install' (the first option of the Anaconda installe|r|, which leads
> to the scary red warning), then my program will not find
> python36.dll at start, and will refuse to work.
>
> What is the recommended way of distributing a complicated product
> like this? I cannot really build everything against Conda packages
> inside the Conda ecosystem, as many of the libraries which I use are
> not in there, so I would anyway rely on other sources (vcpkg or
> homebrew). Any suggestions?
>
> Thanks,
> Kasper
>
> 1.0.0.19
>
> --
> Community Discussion Forum for Anaconda
> ---
> You received this message because you are subscribed to the Google
> Groups "Anaconda - Public" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to anaconda+u...@continuum.io
> <mailto:anaconda+u...@continuum.io>.
> To post to this group, send email to anac...@continuum.io
> <mailto:anac...@continuum.io>.
> --
> Community Discussion Forum for Anaconda
> ---
> You received this message because you are subscribed to the Google
> Groups "Anaconda - Public" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to anaconda+u...@continuum.io
> <mailto:anaconda+u...@continuum.io>.
> To post to this group, send email to anac...@continuum.io
> <mailto:anac...@continuum.io>.

Kasper Peeters

unread,
Jul 29, 2018, 8:30:19 AM7/29/18
to Ray Donnelly, anac...@continuum.io
> We have no experience with vcpkg, it seems an xkcd link is appropriate
> though.

I know what you mean, though I'd say that Anaconda is probably just as
guilty as vcpkg in this respect. Anyway, that's not really what I want
to get into here, I am trying to find a pragmatic solution and would
like to know what's best (in the sense of being recommended by the
Anaconda people).

> I recommend you just make a conda package for your Python extension
> module.

I cannot do that because not all dependencies are in the Conda
ecosystem.

> Or are you embedding Python? In that case you need to make sure your
> executable is in the same folder as this DLL

My software embeds Python, but the embedded Python then imports a
custom module. So there is both a module and an embedded interpreter.
Both of these depend on some libraries which are not part of the
Anaconda ecosystem.

So the problem is: I want to provide a binary installer for my
software. I do not want to bundle that installer with an entire Python
ecosystem (obviously). So I require that users have Anaconda installed
already. However, if I do not include python36.dll in the folder of the
executable, it fails to work. It seems to me that there should be a
clean way of making my executable figure out where Anaconda is
installed, and then link to the python36.dll there.

Including python36.dll in my installer seems like a waste, since the
software cannot work without various python modules which are part of
the Anaconda install anyway.

Hope this makes it a bit more clear.

Thanks,
Kasper


Kasper Peeters

unread,
Jul 29, 2018, 8:32:26 AM7/29/18
to anac...@continuum.io
> On windows, the program is started from the Anaconda prompt. The
> user activates their 2.7 or 3.6 environment before starting the
> appropriate exe.

That's not really a solution for me, I want my users to be able to
simply double click the icon of my software. Even though it embeds
Python, it does not require that people know anything about Python, so
asking them to start things via Anaconda will confuse them.

Thanks,
Kasper

Ray Donnelly

unread,
Jul 29, 2018, 8:58:49 AM7/29/18
to Kasper Peeters, Anaconda - Public
I do not understand this. You need our DLL to be present clearly, so putting your executable beside it in the your prefix is the standard way and most reliable to make sure it gets loaded.

Kasper Peeters

unread,
Jul 29, 2018, 9:23:01 AM7/29/18
to Ray Donnelly, Anaconda - Public



>I do not understand this. You need our DLL to be present clearly, so
>putting your executable beside it in the your prefix is the standard
>way
>and most reliable to make sure it gets loaded.

Yes, but since my software needs other parts of anaconda as well, so it seems silly to bundle it with parts of anaconda (python36.dll) and leave other parts out. It's a solution, but far from elegant.

I had hoped that my program could figure out the anaconda python36.dll location at runtime. Presumably something similar happens when that same python36.dll runs and needs to figure out where the rest of the anaconda files are located.

Thanks,
Kasper

Juan E. Sanchez

unread,
Jul 29, 2018, 9:56:41 AM7/29/18
to anac...@continuum.io
Hello,

Here is a batch script I run to create .bat files to run my software
from a python 2, python 3, tcl installed in an Anaconda installation on
windows.

https://github.com/devsim/devsim_tests_win64/blob/master/run_tests.bat

It assumes that python 2 is installed in:

SET ANACONDAPATH=%USERPROFILE%/Miniconda2/envs

and that I have 2 environments named python2 (for Python 2.7)

%ANACONDAPATH%/python2

and python3 (for python 3.6) installed in:

%ANACONDAPATH%/python3

It calls activate for you in the batch script before calling the
interpreter. Note that PYTHONHOME gets set in order for the binary with
embedded intepreter is able to work.

Perhaps you can write a script that magically finds the installation for
your user and writes batch scripts similar to these.

Regards,

Juan

Ray Donnelly

unread,
Jul 29, 2018, 10:23:52 AM7/29/18
to Kasper Peeters, Anaconda - Public
Anaconda does nothing special here. There are only three auto load mechanisms by which DLLs are found in Windows. By being located alongside the executable, by being in C:\Windows\System32 or by being on PATH. Pick you poison.

Extension modules aren't autoloaded they are loaded via DllOpen which allows specifying the path programmatically.


Thanks,
Kasper

Ray Donnelly

unread,
Jul 29, 2018, 10:26:23 AM7/29/18
to Kasper Peeters, Anaconda - Public


On Sun, Jul 29, 2018, 2:22 PM Kasper Peeters <kasper....@phi-sci.com> wrote:



>I do not understand this. You need our DLL to be present clearly, so
>putting your executable beside it in the your prefix is the standard
>way
>and most reliable to make sure it gets loaded.

Yes, but since my software needs other parts of anaconda as well, so it seems silly to bundle it with parts of anaconda (python36.dll) and leave other parts out. It's a solution,  but far from elegant.

Clearly you need to bundle your dependencies. It's up to you to manage how to leave other parts out. IMHO making conda recipes for all of you dependencies and getting away from vcpkg is my recommendation.

Kasper Peeters

unread,
Jul 29, 2018, 1:08:57 PM7/29/18
to anac...@continuum.io
> Clearly you need to bundle your dependencies. It's up to you to
> manage how to leave other parts out. IMHO making conda recipes for
> all of you dependencies and getting away from vcpkg is my
> recommendation.

I need gtkmm for gtk-3, which is _very_ non-trivial to package right
(and it is available out of the box with vcpkg already). Not something I
want to spend my time on. I also need various other libraries which are
non-trivial to build on Windows.

Cheers,
Kasper

Kasper Peeters

unread,
Jul 29, 2018, 1:11:33 PM7/29/18
to Anaconda - Public
> Anaconda does nothing special here. There are only three auto load
> mechanisms by which DLLs are found in Windows. By being located
> alongside the executable, by being in C:\Windows\System32 or by being
> on PATH. Pick you poison.

Yes, so I guess my question is: how can I determine how to set my PATH
variable such that it contains the relevant Anaconda path? (Something
akin to 'pkg-config' on Linux, which provides a standard mechanism to
figure out where packages are installed, in case the analogy helps).

Thanks,
Kasper

Chris Barker

unread,
Jul 30, 2018, 1:44:40 PM7/30/18
to anaconda, Kasper Peeters
On Sun, Jul 29, 2018 at 7:26 AM, Ray Donnelly <rdon...@anaconda.com> wrote:
Clearly you need to bundle your dependencies. It's up to you to manage how to leave other parts out. IMHO making conda recipes for all of you dependencies and 

I have to agree here, trying to to use two independent package managers seems like a recipe for disaster.

> getting away from vcpkg is my recommendation.

A note here though -- conda manages binary packages -- those pacakges are (mostly) built with conda-build -- conda-build is a system for building conda packages themselves, NOT building software.

What this means is that a conda-build recipe needs to call some other build system to actually build the package -- usually make for OSS stuff, and pip for Python packages. But it can call ANY build system.

So I suspect that you could call vcpkg from a conda recipe, and conda-build would make a nice package of that lib, taking advantage of the vcpkg build system.

Also -- be sure to look to conda-forge to see how many of your dependencies are already there -- you may be pleasantly surprised.


I had hoped that my program could figure out the anaconda python36.dll location at runtime.

you could probably write a startup script that search the "usual" places for a conda install, and then adds that the PATH (and maybe a couple outer env vars) -- essentially like having had the user install it as the default environment.

You could also maybe call the Anaconda activate script from your program.

But really, mixing and matching like this seems very fragile.

-CHB


--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris....@noaa.gov

Ray Donnelly

unread,
Jul 30, 2018, 1:54:40 PM7/30/18
to Anaconda - Public, Kasper Peeters


On Mon, Jul 30, 2018, 6:44 PM 'Chris Barker' via Anaconda - Public <anac...@continuum.io> wrote:
On Sun, Jul 29, 2018 at 7:26 AM, Ray Donnelly <rdon...@anaconda.com> wrote:
Clearly you need to bundle your dependencies. It's up to you to manage how to leave other parts out. IMHO making conda recipes for all of you dependencies and 

I have to agree here, trying to to use two independent package managers seems like a recipe for disaster.

> getting away from vcpkg is my recommendation.

A note here though -- conda manages binary packages -- those pacakges are (mostly) built with conda-build -- conda-build is a system for building conda packages themselves, NOT building software.

What this means is that a conda-build recipe needs to call some other build system to actually build the package -- usually make for OSS stuff, and pip for Python packages. But it can call ANY build system.

So I suspect that you could call vcpkg from a conda recipe, and conda-build would make a nice package of that lib, taking advantage of the vcpkg build system.


Vcpkg is Windows only and should therefore he avoided. AFAIK it delegates to other build systems anyway so it is just an unnecessary waste of time from conda's perspective. The clue is in the name, vc. It is aimed at people using ms visual studio.

Also -- be sure to look to conda-forge to see how many of your dependencies are already there -- you may be pleasantly surprised.


I had hoped that my program could figure out the anaconda python36.dll location at runtime.

you could probably write a startup script that search the "usual" places for a conda install, and then adds that the PATH (and maybe a couple outer env vars) -- essentially like having had the user install it as the default environment.

You could also maybe call the Anaconda activate script from your program.

But really, mixing and matching like this seems very fragile.

-CHB


--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris....@noaa.gov

--

Kasper Peeters

unread,
Jul 30, 2018, 2:31:42 PM7/30/18
to 'Chris Barker' via Anaconda - Public
> I have to agree here, trying to to use two independent package
> managers seems like a recipe for disaster.

All nice in theory, but if there is not a single package manager which
has all dependencies, I have no choice. I would happily use vcpkg's
python, but unfortunately people expect to be able to use my program
together with the python packages they installed with their Anaconda
install.

(Rant mode on: the real problem here is that there is no single
canonical python install for Windows, like there is for all Linux
platforms. But let's take that to a different thread, I really do not
intend to go into those issues here).

> So I suspect that you could call vcpkg from a conda recipe, and
> conda-build would make a nice package of that lib, taking advantage
> of the vcpkg build system.

Yes, but I was really hoping that I could just focus on writing my
software, instead of first having to package various external
dependencies. Again, I agree in theory...

> Also -- be sure to look to conda-forge to see how many of your
> dependencies are already there -- you may be pleasantly surprised.

Unfortunately, neither gtk3 nor gtkmm are available yet (for starters).

> you could probably write a startup script that search the "usual"
> places for a conda install

So let me rephrase my question: has someone on this list written a
script that finds the location of an existing Anaconda install?

> But really, mixing and matching like this seems very fragile.

Developing on Windows is fragile to start with ;-)

Cheers,
Kasper


Ray Donnelly

unread,
Jul 30, 2018, 2:34:22 PM7/30/18
to Anaconda - Public
On Mon, Jul 30, 2018 at 7:31 PM, Kasper Peeters <kasper....@phi-sci.com> wrote:
> I have to agree here, trying to to use two independent package
> managers seems like a recipe for disaster.

All nice in theory, but if there is not a single package manager which
has all dependencies, I have no choice. I would happily use vcpkg's
python, but unfortunately people expect to be able to use my program
together with the python packages they installed with their Anaconda
install.

(Rant mode on: the real problem here is that there is no single
canonical python install for Windows, like there is for all Linux
platforms. But let's take that to a different thread, I really do not
intend to go into those issues here).

Every linux distro builds their own Python and they build them all very differently. IMHO the only cross-platform (and truly cross-linux) standard worth its salt is Anaconda Distribution, but I'm biased.


> So I suspect that you could call vcpkg from a conda recipe, and
> conda-build would make a nice package of that lib, taking advantage
> of the vcpkg build system.

Yes, but I was really hoping that I could just focus on writing my
software, instead of first having to package various external
dependencies. Again, I agree in theory...

> Also -- be sure to look to conda-forge to see how many of your
> dependencies are already there -- you may be pleasantly surprised.

Unfortunately, neither gtk3 nor gtkmm are available yet (for starters).

> you could probably write a startup script that search the "usual"
> places for a conda install

So let me rephrase my question: has someone on this list written a
script that finds the location of an existing Anaconda install?

> But really, mixing and matching like this seems very fragile.

Developing on Windows is fragile to start with ;-)

Cheers,
Kasper
--
Community Discussion Forum for Anaconda
---
You received this message because you are subscribed to the Google Groups "Anaconda - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anaconda+unsubscribe@continuum.io.

Kasper Peeters

unread,
Jul 30, 2018, 2:37:05 PM7/30/18
to anac...@continuum.io
> Vcpkg is Windows only and should therefore he avoided.

It isn't, but that's beside the point. I would not use either vcpkg
or the Conda package manager on any other platform anyway (Linux distros
have much better package managers already, and on macOS the homebrew
system has, again, the advantage that all my dependencies are there
already). So I am only looking for a Windows solution.

Cheers,
Kasper

Kasper Peeters

unread,
Jul 30, 2018, 2:46:51 PM7/30/18
to anac...@continuum.io
> Every linux distro builds their own Python and they build them all
> very differently.

Which is fine for many; I do not care about how the Python ecosystem
gets to my computer. I mainly care about there being a canonical
one, because otherwise I need to distribute my software for each and
every Python that is available. On Linux, that was true until Anaconda
decided to 'mess' with that and provide an alternative. Now instead of
providing one .deb or .rpm that 'just works', I need to provide at
least two, one which links to the standard python and one that links to
Anaconda's python. From the perspective of a Linux software developer,
things in some sense got worse with Anaconda (but please read on).

> IMHO the only cross-platform (and truly cross-linux) standard worth
> its salt is Anaconda Distribution, but I'm biased.

That's fine, I can see the advantages of having one canonical way to
get all the current versions of all python packages. Anaconda certainly
has its advantages, don't get me wrong. However, it is now slowly trying
to take over the role of 'universal package manager', which directly
clashes with the (excellent) deb/rpm packaging systems. I think that's
a pity, because Anaconda really still has a _long_ way to go before it
can match the security and stability of those systems.

Of course, your goals probably differ from mine; I can see valid
reasons to pursue the Anaconda program.

Cheers,
Kasper


Ray Donnelly

unread,
Jul 30, 2018, 2:55:45 PM7/30/18
to Anaconda - Public
You can call deb/rpm excellent, but I'd beg to differ. They are not in the same game since they are system package managers and conda provides so much more, mainly conda environments. They do no support reproducible data science or experimentation with software updates to ensure your analyses still work. In some cases you might get away with chroots but generally speaking system package managers install only to /usr and only work in that scenario. Great for managing your base system but of much less use for data science.
 

Of course, your goals probably differ from mine; I can see valid
reasons to pursue the Anaconda program.

Cheers,
Kasper


Kasper Peeters

unread,
Jul 30, 2018, 3:09:41 PM7/30/18
to anac...@continuum.io
> You can call deb/rpm excellent, but I'd beg to differ. They are not
> in the same game since they are system package managers and conda
> provides so much more, mainly conda environments

Yes, but surely things like gtk are in the class of 'system software'.
Why should I use a 'user package manager' (for lack of a better word)
like Conda to install system software?

> They do no support reproducible data science or experimentation with
> software updates

I grant you that deb/rpm could have done more to allow for per-user
installation, rather than system-wide installation. But to throw them
out altogether is a step too far IMHO.

(Incidentally, this has nothing to do with 'data science' per se, it's
more a the distinction between system-wide software and per-user
software that you are worried about. That's a valid issue to worry
about, and I agree with you that there is an issue here, but I am not
convinced Conda solves it in general either, though of course it may
solve your _particular_ per-user software installation problems).

I'll repost my modified question under a new thread because we've
probably lost everyone who might have an answer ;-)

Cheers,
Kasper

Ray Donnelly

unread,
Jul 30, 2018, 3:27:35 PM7/30/18
to Anaconda - Public
On Mon, Jul 30, 2018 at 8:09 PM, Kasper Peeters <kasper....@phi-sci.com> wrote:
> You can call deb/rpm excellent, but I'd beg to differ. They are not
> in the same game since they are system package managers and conda
> provides so much more, mainly conda environments

Yes, but surely things like gtk are in the class of 'system software'.
Why should I use a 'user package manager' (for lack of a better word)
like Conda to install system software?


I don't believe such a class exists, at least not for categorizing software. It *becomes* 'system software' when installed on a system, usually via a system package manager. Would you call Qt 'system software'? We provide that.
 
> They do no support reproducible data science or experimentation with
> software updates

I grant you that deb/rpm could have done more to allow for per-user
installation, rather than system-wide installation. But to throw them
out altogether is a step too far IMHO.

I use system package managers to update my system, but I keep my system as far away from my work load as possible because I may want to transfer my work to another OS and I want it to remain portable, I also want to use AD software and not some old stuff that my distro released months ago (or years ago depending on which distro you're stuck with).
 

(Incidentally, this has nothing to do with 'data science' per se, it's
more a the distinction between system-wide software and per-user
software that you are worried about. That's a valid issue to worry
about, and I agree with you that there is an issue here, but I am not
convinced Conda solves it in general either, though of course it may
solve your _particular_ per-user software installation problems).

How does conda not solve it in general? It's a specific goal. I agree that the amount of people who care about reproducibility extends beyond data science but it's a very good target audience to aim at when trying to make a high quality modern cross platform software distribution.
 

I'll repost my modified question under a new thread because we've
probably lost everyone who might have an answer ;-)


Indeed!

Kasper Peeters

unread,
Jul 30, 2018, 4:03:50 PM7/30/18
to anac...@continuum.io
(let me know if you want to take this to private email; I don't mind
shaping my opinion on this through this discussion, but I can imagine
not everyone is interested).

> I don't believe such a class exists, at least not for categorizing
> software. It *becomes* 'system software' when installed on a system,
> usually via a system package manager. Would you call Qt 'system
> software'? We provide that.

I know, and I think it's taking you on the path of becoming 'the
Anaconda system', that is to say, a user-environment which you can
install on any machine and have full control over as a user. In your
world view, a user will work in 'an Anaconda system', with no real
sense of what is underneath except that it enables all Anaconda
software to run.

That is very close to a 'Debian system' or a 'Red Hat system'. You can
install those on almost any hardware, and if you only have an account
on a machine administered by someone else, you can always run them in
a virtual machine. What is the difference between saying 'my data
science project runs in this particular Anaconda environment' and 'my
data science software runs on this particular version of Debian in a
VM'? You can run a stable, rock solid bare bones Debian system on your
hardware, and then the most up-to-date Linux distro on top of that in a
VM. What's the difference between that and Anaconda's proposal of
running a bare bones OS and then Anaconda on top?

I have the feeling that the main problem you are trying to solve is the
one of getting faster software updates than those offered by
'traditional' systems. But would it then not be better to put your
weight behind those traditional systems to try to speed things up?

The history of Debian and Red Hat (and the many other distros out
there) has shown that making a 'complete' distribution is incredibly
hard. There is a reason why they do not update so fast.

> I use system package managers to update my system, but I keep my
> system as far away from my work load as possible

Yes, so as I wrote above: you have a bare-bones system, just enough to
run Anaconda from your user account on top of that. For that to work
flawlessly, you need to reproduce everything that traditional Linux
distros have done, otherwise you get a weird mixture of system-provided
software and Anaconda-provided software (and indeed, that's the
situation that we're in now).

> to transfer my work to another OS and I want it to remain portable

If you have a Linux VM you can take it to any OS you like ;-)

> also want to use AD software and not some old stuff that my distro
> released months ago (or years ago depending on which distro you're
> stuck with).

So it's update speed indeed?

> How does conda not solve it in general?

It does not solve security and dependency issues nearly as well as
deb/rpm. There are many things (mail/web/file server for example) which
you would not do by installing a bare bones OS and then Anaconda on top
from which you run postfix/apache/samba (there is a reason why Conda
does not package postfix). You may not feel it this way, but you also do
draw a line between 'real system software' and 'things which a user may
want to change', and for good reasons.

To be less negative: what I would personally have done is to help one
of the major Linux distros in extending their package manager so that
it can install things per-user, and from 'unstable' repositories. Then
you would have no artificial boundary between underlying system
software and user-installed software.

Cheers,
Kasper


Chris Barker

unread,
Jul 30, 2018, 4:35:18 PM7/30/18
to anaconda
Just a thought or two, and then you are right, the discussion should probably about solving (or not) your problem)

I know, and I think it's taking you on the path of becoming 'the
Anaconda system', that is to say, a user-environment which you can
install on any machine and have full control over as a user.

yeah, it kind of is -- and, in fact, for services as well -- I kinda see conda as a lightweight Docker :-)

I have the feeling that the main problem you are trying to solve is the
one of getting faster software updates than those offered by
'traditional' systems.

well, that, and also environments -- that is a BIG deal -- the ability to run multiple different dependency stacks on one system at once in a lightweight way is really useful.

which is why I think of it as a mini-docker, because that's why Docker has become so popular -- you can have a self contained system with a specific set of dependencies -- and easily run multiple version on pone machine, copy it over to other machines, etc. In theory, you could "just run  VM" and Debian or something to accomplish the same thing -- but folks seem to like Docker a lot...

Back to the original question -- it seem you have users that really want your software to run with conda -- I can see what that's a pain for you -- one more system to support, if you want them to also run on bare Windows, and various Linux distros, and OS-X ....

But maybe you could embrace it -- if you support conda -- that could be the *one* platform you need to support :-)

BTW, there Is a canonical Python on Windows and OS-X -- the one you download from python.org.


-CHB


Ray Donnelly

unread,
Jul 30, 2018, 4:44:03 PM7/30/18
to Anaconda - Public


On Mon, Jul 30, 2018, 9:03 PM Kasper Peeters <kasper....@phi-sci.com> wrote:
(let me know if you want to take this to private email; I don't mind
shaping my opinion on this through this discussion, but I can imagine
not everyone is interested).

> I don't believe such a class exists, at least not for categorizing
> software. It *becomes* 'system software' when installed on a system,
> usually via a system package manager. Would you call Qt 'system
> software'? We provide that.

I know, and I think it's taking you on the path of becoming 'the
Anaconda system', that is to say, a user-environment which you can
install on any machine and have full control over as a user. In your
world view, a user will work in 'an Anaconda system', with no real
sense of what is underneath except that it enables all Anaconda
software to run.

That is very close to a 'Debian system' or a 'Red Hat system'. You can
install those on almost any hardware, and if you only have an account
on a machine administered by someone else, you can always run them in
a virtual machine. What is the difference between saying 'my data
science project runs in this particular Anaconda environment' and 'my
data science software runs on this particular version of Debian in a
VM'? You can run a stable, rock solid bare bones Debian system on your
hardware, and then the most up-to-date Linux distro on top of that in a
VM. What's the difference between that and Anaconda's proposal of
running a bare bones OS and then Anaconda on top?

Needing access to a computer with some virtualization software? A distro modern enough to run docker? Permissions to do so? Anaconda targets a much lower requirement, a user account and as such is useful in vastly more scenarios, *including* docker and VMs

I have the feeling that the main problem you are trying to solve is the
one of getting faster software updates than those offered by
'traditional' systems. But would it then not be better to put your
weight behind those traditional systems to try to speed things up?

Providing up to date securely built software in a uniform way across most Linux distros and macOS and Windows is our main goal. That and reproducible, relocatable computing environments.


The history of Debian and Red Hat (and the many other distros out
there) has shown that making a 'complete' distribution is incredibly
hard. There is a reason why they do not update so fast.

> I use system package managers to update my system, but I keep my
> system as far away from my work load as possible

Yes, so as I wrote above: you have a bare-bones system, just enough to
run Anaconda from your user account on top of that. For that to work
flawlessly, you need to reproduce everything that traditional Linux
distros have done, otherwise you get a weird mixture of system-provided
software and Anaconda-provided software (and indeed, that's the
situation that we're in now).

No this is not the correct way to use AD. We depend only on the systems X11 on Linux and glibc. We do not run any other system software at all. Introducing that to AD would be a very bad idea.

> to transfer my work to another OS and I want it to remain portable

If you have a Linux VM you can take it to any OS you like ;-)

> also want to use AD software and not some old stuff that my distro
> released months ago (or years ago depending on which distro you're
> stuck with).

So it's update speed indeed?

Performance, security, update speed, uniformity, compatibility. All the things.


> How does conda not solve it in general?

It does not solve security and dependency issues nearly as well as
deb/rpm. There are many things (mail/web/file server for example) which
you would not do by installing a bare bones OS and then Anaconda on top
from which you run postfix/apache/samba (there is a reason why Conda
does not package postfix). You may not feel it this way, but you also do
draw a line between 'real system software' and 'things which a user may
want to change', and for good reasons.

I have literally no idea why postfix would be problematic for AD to package if we wanted to.

To be less negative: what I would personally have done is to help one
of the major Linux distros in extending their package manager so that
it can install things per-user, and from 'unstable' repositories. Then
you would have no artificial boundary between underlying system
software and user-installed software.

That would be of no use to us or our users. They want to use AD, it suits their needs better than the other suggestions you've given, all of which are less general than ADs reach.


Cheers,
Kasper


--
Community Discussion Forum for Anaconda
---
You received this message because you are subscribed to the Google Groups "Anaconda - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anaconda+u...@continuum.io.

Kasper Peeters

unread,
Jul 30, 2018, 5:05:30 PM7/30/18
to anac...@continuum.io
> Needing access to a computer with some virtualization software? A
> distro modern enough to run docker? Permissions to do so? Anaconda
> targets a much lower requirement, a user account and as such is
> useful in vastly more scenarios, *including* docker and VMs

Yes, but that goes at the expense of losing other advantages (security,
stability) of existing distros.

> No this is not the correct way to use AD. We depend only on the
> systems X11 on Linux and glibc.

Because it is hard to make that user-installable as well? (it's
certainly not impossible) Or because it really doesn't make much sense
to be able to swap out glibc or X11?

> We do not run any other system software at all. Introducing that to
> AD would be a very bad idea.

Ok, I am getting your point, but for me this means you suffer the
not-invented-here-syndrome, spending a lot of manpower on re-inventing
many (nontrivial) wheels.

> I have literally no idea why postfix would be problematic for AD to
> package if we wanted to.

You can't even bind to port 25 as a normal user, so why expect a normal
user to install their own private copy of postfix? It doesn't make
sense, just like it doesn't make sense to have users install their own
glibc. Or, if you ask me, to have users install their own Qt or Gtk.

Cheers,
Kasper


Ray Donnelly

unread,
Jul 30, 2018, 5:13:43 PM7/30/18
to Anaconda - Public
On Mon, Jul 30, 2018 at 10:05 PM, Kasper Peeters <kasper....@phi-sci.com> wrote:
> Needing access to a computer with some virtualization software? A
> distro modern enough to run docker? Permissions to do so? Anaconda
> targets a much lower requirement, a user account and as such is
> useful in vastly more scenarios, *including* docker and VMs

Yes, but that goes at the expense of losing other advantages (security,
stability) of existing distros.

> No this is not the correct way to use AD. We depend only on the
> systems X11 on Linux and glibc.

Because it is hard to make that user-installable as well? (it's
certainly not impossible) Or because it really doesn't make much sense
to be able to swap out glibc or X11?

> We do not run any other system software at all. Introducing that to
> AD would be a very bad idea.

Ok, I am getting your point, but for me this means you suffer the
not-invented-here-syndrome, spending a lot of manpower on re-inventing
many (nontrivial) wheels.

> I have literally no idea why postfix would be problematic for AD to
> package if we wanted to.

You can't even bind to port 25 as a normal user, so why expect a normal
user to install their own private copy of postfix? It doesn't make
sense,

I take your point about a port being in use, but there's still no reason we could not or should not package postfix for people who'd rather use one we built than their distro. FWIW I believe we take better care of low-level binary security and performance than the average Linux distro since you keep going back to security.

 
just like it doesn't make sense to have users install their own
glibc. Or, if you ask me, to have users install their own Qt or Gtk.


I disagree entirely. How can we, as an independent distribution make binaries (say PyQt) that are supposed to work with both Fedora and Ubuntu, and worse, at whatever version some end-user *happens* to have installed at the time? Do you expect PyQt 5.9 to work with Qt 5.6.3? Or PyQt 5.9 compiled against Ubuntu's PyQt 5.9 to work correctly when launched on Fedora? These things are simply not possible. If your suggestion to avoid this problem is that we 'throw-in' with either Ubuntu or Fedora is also not of interest to us. If I wanted to work on Fedora or Ubuntu I'd have applied for a job with the respective companies. The Anaconda Distribution is on a very different and distinct trajectory from those Linux distros (and one I think is far better for a huge array of users and use-cases).


 
Cheers,
Kasper


--
Community Discussion Forum for Anaconda
---
You received this message because you are subscribed to the Google Groups "Anaconda - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anaconda+unsubscribe@continuum.io.
Reply all
Reply to author
Forward
0 new messages