Experimental squashfuse support & initial performance figures

196 views

Skip to first unread message

David Trudgian

unread,

Apr 25, 2022, 5:14:24 PM4/25/22

to Singularity Community Edition

Hi all,

We’ve just merged PR #711 to the SingularityCE git repository, which brings a new `--sif-fuse` flag to enable experimental mount of SIF files with squashfuse, in unprivileged user namespace flows. You can try it out on a build from the master branch.

You’ll need to have `squashfuse` installed on your system, and use the `--sif-fuse` flag in conjunction with `-u` or an unprivileged install. SingularityCE will then try to mount the SIF using squashfuse, rather than extract it to a temporary sandbox.

Whenever the topic of squashfuse mounts has come up, we’ve had various questions about performance, so here is a very quick and dirty benchmark that shows why setuid kernel-level mounts of SIF files can still be beneficial.

System: Lenovo P700, Dual Xeon E2680v3, 80GB RAM.

Storage: WD Black 500GB NVME. PCIe Gen 3.0 x4 connection.

Host OS: Fedora 35

Squashfuse: 0.1.104

This is somewhat atypical of most HPC scenarios, as we are using local NVMe storage… rather than a network parallel file system. However, the CPU configuration is similar to a lot of HPC nodes, and we’ll be working with a SIF file that is small enough it can be fully cached in RAM. We’re focusing on the speed of the squashfs implementations, rather than underlying storage.

An Ubuntu 20.04 container was built, containing an installation of ‘fio’, and a 1GB test file full of random data. Simple random read and sequential read tests were then performed, with 16 jobs on this 24-core system.

Bootstrap: docker
From: ubuntu:20.04

%post
       apt -y update
       apt -y install fio
       dd if=/dev/random of=/1gb-file bs=1M count=1024

%runscript
       fio --filename=/1gb-file --rw=randread --bs=4k --ioengine=libaio \
        --runtime=60 --numjobs=16 --time_based --group_reporting\
        --name=iops-test-job --eta-newline=1 --readonly

       fio --filename=/1gb-file --rw=read --bs=64k --ioengine=libaio \
       --runtime=60 --numjobs=16 --time_based \
       --group_reporting --name=throughput-test-job --eta-newline=1 --readonly

Privileged kernel squashfs mount of SIF

The random read test gave a result of 1,392k IOPS with 10.75% usr CPU, 46.56% sys cpu.

The sequential read test gave a result of 5,437 MiB/s with 1.02% usr CPU, 31.77% sys cpu.

Unprivileged squashfuse mount of SIF

The random read test gave an average of 85.1K IOPS with 1.45% usr cpu and 3.68% sys cpu.

The sequential read test gave an average of 4,793MiB/s with 1.15% usr cpu and 30.33% sys cpu.

Note - there is an additional squashfuse process outside of the container, that is consuming 100% of a CPU core during the test, and is not counted in the fio CPU stats above.

Summary

For pure sequential reads, squashfuse appears to be quite competitive with kernel mount squashfs on the test system, in terms of throughput achieved.

For random reads, the kernel squashfs mount provided approx. 16x as many IOPS as the squashfuse approach. This likely mainly reflects the fact that kernel squashfs is multi-threaded, while squashfuse is not.

Note that the headline IOPS figure of >1.3 million IOPS with the privileged kernel squashfs mount is in excess of what the underlying NVMe storage could deliver (542K IOPS when benchmarked). This reflects the fact that the SIF container is being cached, and the reads are against that cached data, not the underlying NVMe storage.

Based on this initial work we expect squashfuse mounted SIF to be suitable for many workflows, but to exhibit a penalty for applications which access extremely large numbers of small files in the container, e.g. startup of very large and complex Python applications. This will lessen the advantage that SIF containers have over non-container installs of these applications.

We’ll revisit this with more detail, in the form of a blog post, at a later date.

Cheers,

DT

David Trudgian

unread,

Apr 26, 2022, 9:54:09 AM4/26/22

to Singularity Community Edition

Hi all,

A little bit more information... on the same test system as detailed in the previous message, the impact of squashfuse mounted SIF on container startup/teardown overhead was measured by running `/bin/true` in an ubuntu_20.04 sif container, via `singularity run ubuntu_latest.sif /bin/true`. The test container was on local NVMe storage, and `singularity run` was repeated in a tight sequential loop.

Over 1000 invocations, the average runtime was 0.12s with the default privileged kernel squashfs mounts.

Over 1000 invocations, the average runtime was also 0.12s with squashfuse mounts and user namespace.

I.E. we see no practical impact to container startup/teardown times from squashfuse or user namespace on this test system.

Hope this information is useful to those who might want to investigate this feature.