Singularity build hangs for large containers

938 views
Skip to first unread message

Mark Miesch

unread,
Mar 1, 2021, 7:03:32 PM3/1/21
to singularity

I'm trying to build some fairly large containers that include intel One API compilers and runtime libraries.  

I keep finding that the `singularity build` command hangs at the stage where it is trying to built the sif file

INFO:    Creating SIF file...

It can sit there for over an hour if I let it.  But eventually it will time out with this:

client_loop: send disconnect: Broken pipe

The container builds from a previous container.  The previous container is 3.1 GB in size and, when it works, the container I am trying to build is 7.6 GB.

I am building this on an AWS instance running ubuntu 18.04 that was specifically configured to handle memory-intensive workloads.  It has 128 GB of memory so it should be fine.  When I run free it shows 110G available and when I run top while the build is occurring it looks like I have plenty of memory.  I also have 250 GB of free disk space.

Yet, even with all this free space, sometimes I can get it to work by running
docker system prune -a
sudo singularity cache clean

And then clearing out /var/tmp and /tmp.

This worked on Friday.  But today even this doesn't work.  I tried it three times and the build is hanging  every time.

This seems like a cumulative problem - builds work until they don't.   Which suggests memory.  But, I don't know what else to clean - as far as I can tell there should be plenty of memory available.  Is there somewhere else that singularity stores cached images?  Anywhere else to clean?  Is there something I am missing?

I just dropped the kernel caches and am trying one more time.

Thanks for any insights you may have  

Mark Miesch

unread,
Mar 1, 2021, 7:52:48 PM3/1/21
to singularity, Mark Miesch

Well, lo and behold, that did it.  Running this first did the trick:

sync; echo 3 > /proc/sys/vm/drop_caches

Then I ran singularity build again and the container built.  

Still, if anyone has any other tips for managing memory with Singularity for large containers I'm all ears

Frankie Robertson

unread,
Mar 6, 2021, 3:26:43 AM3/6/21
to singularity, Mark Miesch
Hi Mark,

I am having similar problems. Are you able to reduce it to a mksquashfs problem by finding the build-temp directory and running `mksquashfs rootfs/ test -noappend -all-root -info`? If you do this, you can find the file at which it freezes. In my case it is always the same place, although it does not happen absolutely every build. This is using mksquashfs version 4.2-git-stable (2013/06/21).

Best regards,
Frankie

Frankie Robertson

unread,
Mar 6, 2021, 6:43:45 AM3/6/21
to singularity, Frankie Robertson, Mark Miesch
Hi Mark,

I think I've got to the bottom of my problem now, and my solution might be able to help you. The problem is probably related to this one: https://github.com/hpcng/singularity/issues/5434 . I think you need to limit how much memory mksquashfs uses with the `mksquashfs mem` configuration variable in your singularity.conf . I suppose setting it to some big fraction of your RAM e.g. 1/4 - 1/2 will do the trick. Soon you'll be able to do this with environment variables instead: https://github.com/hpcng/singularity/pull/5805

I may have had the same problem, but I think the underlying thing in my case was ulimits limiting the amount of memory. The other thing is that if you have an old version of mksquashfs e.g. 4.2, it seems like it doesn't allocate memory up front. It seems like the worker threads allocate memory, and when they fail they just die. (I don't know this is what's happening for a fact, it's just a guess at this point.)

In conclusion here is what I would try in your case based on my own attempts to fix my own problem: 1) Make sure you have an up to date version of mksquashfs (>=4.3) 2) Try the mksquashfs mem configuration variable.

Best regards,
Frankie

Mark Miesch

unread,
Mar 12, 2021, 4:54:31 PM3/12/21
to singularity, frankie....@gmail.com, Mark Miesch

Thanks Frankie for taking the time to write.  I'm using v 4.3 of mksquashfs so I went ahead and modified the `mksquash mem` setting in my singularity.conf file as you suggested.  So far so good - hopefully this will continue to help in future builds.
Reply all
Reply to author
Forward
0 new messages