I'm trying to build some fairly large containers that include intel One API compilers and runtime libraries.
I keep finding that the `singularity build` command hangs at the stage where it is trying to built the sif file
INFO: Creating SIF file...
It can sit there for over an hour if I let it. But eventually it will time out with this:
client_loop: send disconnect: Broken pipe
The container builds from a previous container. The previous container is 3.1 GB in size and, when it works, the container I am trying to build is 7.6 GB.
I am building this on an AWS instance running ubuntu 18.04 that was specifically configured to handle memory-intensive workloads. It has 128 GB of memory so it should be fine. When I run free it shows 110G available and when I run top while the build is occurring it looks like I have plenty of memory. I also have 250 GB of free disk space.
Yet, even with all this free space, sometimes I can get it to work by running
docker system prune -a
sudo singularity cache clean
And then clearing out /var/tmp and /tmp.
This worked on Friday. But today even this doesn't work. I tried it three times and the build is hanging every time.
This seems like a cumulative problem - builds work until they don't. Which suggests memory. But, I don't know what else to clean - as far as I can tell there should be plenty of memory available. Is there somewhere else that singularity stores cached images? Anywhere else to clean? Is there something I am missing?
I just dropped the kernel caches and am trying one more time.
Thanks for any insights you may have