Hi all,
We’ve just merged PR #711 to the SingularityCE git repository, which brings a new `--sif-fuse` flag to enable experimental mount of SIF files with squashfuse, in unprivileged user namespace flows. You can try it out on a build from the master branch.
You’ll need to have `squashfuse` installed on your system, and use the `--sif-fuse` flag in conjunction with `-u` or an unprivileged install. SingularityCE will then try to mount the SIF using squashfuse, rather than extract it to a temporary sandbox.
Whenever the topic of squashfuse mounts has come up, we’ve had various questions about performance, so here is a very quick and dirty benchmark that shows why setuid kernel-level mounts of SIF files can still be beneficial.
System: Lenovo P700, Dual Xeon E2680v3, 80GB RAM.
Storage: WD Black 500GB NVME. PCIe Gen 3.0 x4 connection.
Host OS: Fedora 35
Squashfuse: 0.1.104
This is somewhat atypical of most HPC scenarios, as we are using local NVMe storage… rather than a network parallel file system. However, the CPU configuration is similar to a lot of HPC nodes, and we’ll be working with a SIF file that is small enough it can be fully cached in RAM. We’re focusing on the speed of the squashfs implementations, rather than underlying storage.
An Ubuntu 20.04 container was built, containing an installation of ‘fio’, and a 1GB test file full of random data. Simple random read and sequential read tests were then performed, with 16 jobs on this 24-core system.
Bootstrap: docker
From: ubuntu:20.04
%post
apt -y update
apt -y install fio
dd if=/dev/random of=/1gb-file bs=1M count=1024
%runscript
fio --filename=/1gb-file --rw=randread --bs=4k --ioengine=libaio \
--runtime=60 --numjobs=16 --time_based --group_reporting\
--name=iops-test-job --eta-newline=1 --readonly
fio --filename=/1gb-file --rw=read --bs=64k --ioengine=libaio \
--runtime=60 --numjobs=16 --time_based \
--group_reporting --name=throughput-test-job --eta-newline=1 --readonly
Privileged kernel squashfs mount of SIF
The random read test gave a result of 1,392k IOPS with 10.75% usr CPU, 46.56% sys cpu.
The sequential read test gave a result of 5,437 MiB/s with 1.02% usr CPU, 31.77% sys cpu.
Unprivileged squashfuse mount of SIF
The random read test gave an average of 85.1K IOPS with 1.45% usr cpu and 3.68% sys cpu.
The sequential read test gave an average of 4,793MiB/s with 1.15% usr cpu and 30.33% sys cpu.
Note - there is an additional squashfuse process outside of the container, that is consuming 100% of a CPU core during the test, and is not counted in the fio CPU stats above.
Summary
For pure sequential reads, squashfuse appears to be quite competitive with kernel mount squashfs on the test system, in terms of throughput achieved.
For random reads, the kernel squashfs mount provided approx. 16x as many IOPS as the squashfuse approach. This likely mainly reflects the fact that kernel squashfs is multi-threaded, while squashfuse is not.
Note that the headline IOPS figure of >1.3 million IOPS with the privileged kernel squashfs mount is in excess of what the underlying NVMe storage could deliver (542K IOPS when benchmarked). This reflects the fact that the SIF container is being cached, and the reads are against that cached data, not the underlying NVMe storage.
Based on this initial work we expect squashfuse mounted SIF to be suitable for many workflows, but to exhibit a penalty for applications which access extremely large numbers of small files in the container, e.g. startup of very large and complex Python applications. This will lessen the advantage that SIF containers have over non-container installs of these applications.
We’ll revisit this with more detail, in the form of a blog post, at a later date.
Cheers,
DT