Adding pip packages

527 views
Skip to first unread message

Miro Hodak

unread,
Apr 15, 2020, 12:10:19 AM4/15/20
to singularity
Hello,

my users are using singularity containers for running Python tasks on a Slurm cluster. One problem that often comes up is that users want to use pip to add different libraries on top of what is in the container but that is not allowed with the default read-only format. What would be the best way to do this?

I am aware of overlay but that is root only (unless on ext3 - I am not using this), I also want changes to persist so that once a library is installed it can be used later. Basically, I would like to replicate Python virtual environment functionality with singularity at a user level without users creating singularity containers themselves. Is there a way to accomplish this?

Kandes, Martin

unread,
Apr 15, 2020, 1:13:27 AM4/15/20
to singularity
Hi Miro,

The simple, quick fix is to have the user install the package in their HOME directory using the something like pip install --user <package> [1]. Long-term, this will likely become unmanageable for the user. i.e., keeping track of what python package was installed inside a containerized environment or not.  So yes, you might also try and setup virtualenvs too to keep things more structured. But again, this assumes users are familiar with how to use virtualenvs though too. It's always a trade off how complicated it needs to get for them.

Marty

[1]

[mkandes@comet-ln2 ~]$ singularity shell /share/apps/compute/singularity/images/ubuntu/ubuntu.simg
Singularity> which python
/opt/miniconda3/bin/python
Singularity> python
Python 3.7.4 (default, Aug 13 2019, 20:35:49)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy  
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'numpy'
>>> exit()
Singularity> which pip
/opt/miniconda3/bin/pip
Singularity> pip install --user numpy
Collecting numpy
     |████████████████████████████████| 20.2MB 13.1MB/s
Installing collected packages: numpy
  WARNING: The scripts f2py, f2py3 and f2py3.7 are installed in '/home/mkandes/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed numpy-1.18.2
Singularity> python
Python 3.7.4 (default, Aug 13 2019, 20:35:49)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> numpy.__version__
'1.18.2'
>>> numpy.__file__
'/home/mkandes/.local/lib/python3.7/site-packages/numpy/__init__.py'
>>>


From: Miro Hodak <mrmh...@gmail.com>
Sent: Tuesday, April 14, 2020 9:10 PM
To: singularity <singu...@lbl.gov>
Subject: [Singularity] Adding pip packages
 
Hello,

my users are using singularity containers for running Python tasks on a Slurm cluster. One problem that often comes up is that users want to use pip to add different libraries on top of what is in the container but that is not allowed with the default read-only format. What would be the best way to do this?

I am aware of overlay but that is root only (unless on ext3 - I am not using this), I also want changes to persist so that once a library is installed it can be used later. Basically, I would like to replicate Python virtual environment functionality with singularity at a user level without users creating singularity containers themselves. Is there a way to accomplish this?

--
You received this message because you are subscribed to the Google Groups "singularity" group.
To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.
To view this discussion on the web visit https://groups.google.com/a/lbl.gov/d/msgid/singularity/1d906359-63ac-4da7-b812-f86ff2616fd2%40lbl.gov.

Gregory M. Kurtzer

unread,
Apr 16, 2020, 11:02:43 PM4/16/20
to singularity
Just to add on Marty's response, installing any software into the user's home directory may adversely affect the software stack reproducibility of a containerized workload. It could be as drastic as it just doesn't work, but it could also be as subtle as different software versions are being used without your direct knowledge. It also effects the portability of that containerized workload as now it is tied to a particular user's directory on a particular system.

For this reason, I would strongly suggest that you create containers with all of the libraries necessary for a particular contained workload and use them as is with no external dependencies. You may even want to try virtualizing your home directory to a sub directory to ensure that the container is working as expected with no external influences.

Hope that helps,
Greg


Kandes, Martin

unread,
Apr 16, 2020, 11:18:43 PM4/16/20
to singularity
Miro,

I'll strongly second Greg's point. The --user dump into the HOME directory is a quick fix, not a long-term solution. The long-term solution is to have a container with all of the dependencies necessary installed in the container itself. Of course, this is the ideal situation. The practical question is who is going to build and maintain the container. You? The user? Someone else? I always encourage our users to customize the base containers we have to suit their specific needs because we don't have the time to create a custom container for everyone. But I know this is a big ask for some users.

Marty


From: Gregory M. Kurtzer <gmku...@gmail.com>
Sent: Thursday, April 16, 2020 8:02 PM
To: singularity <singu...@lbl.gov>
Subject: Re: [Singularity] Adding pip packages
 

v

unread,
Apr 16, 2020, 11:55:46 PM4/16/20
to singu...@lbl.gov
I want to quickly point out that there are two different use cases here, and both are valid. The first is using a container as an interactive development environment, for which you do want writable so you can install software and test things out. It’s very reasonable to install things to home so you can do exactly this, with the majority of the software stack being provided by the container. It’s not perfect, but it’s still a small improvement over not having any hardened bass that you can build further on top of.

The second use case is what most folks in this conversation are talking about - the final generation of a container to run a production workflow that is reproducible. This is the container that you would build to test when you are 99% sure that you are done and ready to start testing or workflow in “production mode.”

So - with this in mind, I think it’s very reasonable to have a set up that provides container bases with core languages and major library’s that users can build on top of. You would then want to teach a routine where the user tests and develops in the quasi-writable environment with the goal of building a read only sif image.

It’s kind of like testing a lot of recipes before you write down the final one, and make the cookies to bring to the grand event. :)


Gregory M. Kurtzer

unread,
Apr 17, 2020, 11:37:28 AM4/17/20
to singularity
Great points Vanessa!

In my experience, most of that interactive development occurs usually on workstations and development environments that the user owns or has direct access to (e.g. laptop, workstations, dedicated cloud instance, etc.).

How much of this is necessary to do on the HPC itself and in theory, does using --fakeroot (which depends on recent kernels due to user namespace) and a sandbox would provide the necessary support?

Greg




--
Gregory M. Kurtzer

Kandes, Martin

unread,
Apr 17, 2020, 7:13:57 PM4/17/20
to singularity
Greg,

I would say we have to use the pip install --user option regularly to help bridge the gap from the base containers we build and support for users, but may not exactly have ALL of the specific Python packages they might need to use in conjunction with say something like TensorFlow. So yes, it's a regular practice for us.

As for --fakeroot, it may not be allowed on some HPC systems. e.g., I think our current kernel is compatible, but also too old security-wise to use it in production right now. So I'm not sure if the --fakeroot option is the better way to go long-term. But honestly, I simply don't have any experience using it yet in a production environment with users.

Marty

From: Gregory M. Kurtzer <gmku...@gmail.com>
Sent: Friday, April 17, 2020 8:37 AM
Reply all
Reply to author
Forward
0 new messages