1) Do you use GPUs?
With most container platforms you must install an NVIDIA driver directly into the container. Since the NVIDIA driver on the host system installs kernel modules and the kernel is shared between the container and the host system, you must exactly match version numbers between the driver inside the container and the driver running on the host system. This breaks the portability of the container. With Singularity you don’t have to install an NVIDIA driver into your container. Instead, you simply pass the --nv option at runtime and Singularity will automatically locate NVIDIA libraries and binaries on the host system and map them into your container. This is an easy, portable solution.
2) Do you use MPI?
It is extremely difficult to set up a multi-node MPI job with a traditional container platform. But Singularity solves this problem by allowing users to run MPI jobs using a “hybrid” model. The model assumes that MPI will be installed on the host and within the container. A user will invoke a containerized program using mpirun or a similar command. Singularity is MPI aware and will set up new containers on all of the nodes within the MPI job. Then it will facilitate communication between the MPI on the host system and the MPI within the container. Often, it’s not even necessary to match the versions of MPI running on the host and within the container because there is some degree of compatibility between different MPI versions.
3) Do a lot of non-root users utilize your HPC system?
Security is a huge concern with containers. But many container platforms focus exclusively on security within the container. In other words, how do you protect the contents of a container from a potentially hostile environment? In a multitenant HPC environment, system administrators have the opposite concern. How do you protect the HPC environment from a potentially malicious container? Singularity has a novel security paradigm allowing untrusted users to run untrusted containers safely. In a nutshell, Singularity prevents users from escalating privileges within the container. If you don’t have root on the system, you can’t have root within the container. Moreover, users have the same UID/GID context inside of the container as outside, allowing users to access data they own, and preventing them from accessing data that they don’t. And unlike other container platforms, Singularity runs without any root-owned daemon processes, decreasing the potential attack surface.
4) Do you use a batch scheduling system (like Slurm or PBS for instance)?
Traditional container platforms that launch containers with the aid of a background daemon process don’t work well with HPC batch schedulers. The daemon allows the containers to launch outside of the scope of a resource manager. In this state the batch scheduler can no longer track the resources consumed by the containerized process. Singularity starts containers with the appropriate UID/GID context. Once containerized processes are initiated, Singularity execs itself out of existence and the containerized processes are running on the system (within their namespaces) just like any other processes. This architecture allows the resource manager to track utilization by the container and the batch scheduler to schedule other jobs accordingly.
5) Do you use a parallel file system (like Lustre or GPFS for instance)?
Singularity can actually boost performance on parallel file systems over bare metal. Parallel file systems can exhibit reduced performance when processes simultaneously open large numbers of small files. Take Python as an example. A single invocation of the Python interpreter may stat thousands of files. If you are running an embarrassingly parallel Python job with thousands of simultaneous interpreters your file system will grind to a halt as it essentially sustains a DDoS attack. Singularity containers are single image files which are mounted onto the host’s loop device. When accessing data in this manner, the kernel is able to take advantage of built-in optimizations that reduce the number of meta-data operations necessary to run your Python job. At large scale, this can result in file-system performance that is improved by several orders of magnitude. This fact prompted admins at the SLAC National Accelerator Laboratory US ATLAS Computing center to containerize their entire software stack (~400 GB) using Singularity.
--
You received this message because you are subscribed to the Google Groups "singularity" group.
To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.
To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.
--
You received this message because you are subscribed to the Google Groups "singularity" group.
To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.
Thank you very much for your answer and your link, I will have a look at it ASAP.
Also, I watched your presentation at NVIDIA GTC and I have to say it was really fun !
I would be glad to contribute to the growing of Singularity, which sounds like a very interesting solution. I will let you know about our choice and any work regarding Singularity.
To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.
To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.