Singularity + Snakemake: Snakemake does not recognize that files have already been created when running inside a Singularity container.

577 views
Skip to first unread message

Carlos Guzman

unread,
May 1, 2018, 4:31:22 PM5/1/18
to singularity
I have noticed an odd problem when combining Snakemake workflows inside Singularity containers. By default Snakemake will not re-run files that it has already completed, so in the event that the workflow ends abruptly or a new sample file is added to an existing directory of samples, only the samples that have not been run through the complete pipeline or the new samples are run. However, this does not happen when you run a Snakemake workflow inside a Singularity container. It runs every single through every single rule regardless of whether it already exists or not. My assumption is because Snakemake relies on a .snakemake directory that it creates that has information on the various temp files it has for every sample and that doesn't exist within Singularity.

Any idea on how you can get around this? It's not a use-case that happens very often, but we have run into the issue where the pipeline hangs when running a huge amount of samples, and we have to restart the entire process instead of only finishing up what wasn't complete.

Thanks!

Michael Bauer

unread,
May 1, 2018, 4:53:35 PM5/1/18
to singu...@lbl.gov
Hi Carlos,

Is it possible to give snakemake an argument to use a specified directory instead of .snakemake? If so, you could point it to use /tmp/.snakemake and it should work as expected. If not, I would try bind mounting an empty directory from the host into the container at the expected location of the .snakemake directory. Let me know if either of those solutions work for you. 

Cheers,
Michael

--
You received this message because you are subscribed to the Google Groups "singularity" group.
To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.

v

unread,
May 1, 2018, 7:40:43 PM5/1/18
to singu...@lbl.gov
Hey Carlos,

Did you try setting the active directory in snakemake to be the temporary directory where you are running things? Ideally you would want to either have one sub directory in /tmp assigned per subject run (and you could then go back and find it again to update the run for the subject) OR be sure to clean up after you do the run. If your snakemake file is being run from the base of a directory in the container where it doesn't or can't save the state, then it would be logical that it's starting over again! What I did for snakemake.scif (you have probably seen this but in case not) is to always cd to the mounted folder first (see here) and also to copy a (fresh) Snakefile there each time (here in the setup app) and then in the Snakemake file I am sure to set the workdir to always be this same spot in the container, which you get to decide where it binds on the host (where either you maintain or clean the .snakemake folder).

I think if Snakemake works like make it wouldn't rely just on a directory, but on the existence of inputs and outputs where they are expected, but I'm not totally sure. Too much snakey thoughts !! :O

Best,

Vanessa

--
You received this message because you are subscribed to the Google Groups "singularity" group.
To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.



--
Vanessa Villamia Sochat
Stanford University '16

Carlos Guzman

unread,
May 1, 2018, 7:48:33 PM5/1/18
to singularity
Hi Vanessa,

I have almost literally entirely based the pipeline around your snakemake.scif github. Thus I am always cding into the mounted fodler first, and copying a fresh Snakefile there each time unless the Snakefile exists. I am also setting the workdir to be /scif/data to always be that same spot in the container. The .snakemake file is visible in the directory that I bind (as shown in your snakemake.scif example).

Perhaps I need to remove the Snakefile after every run to ensure a fresh Snakefile is copied there? I'm not sure that would fix the problem though. You can actually find the pipeline here: https://github.com/BennerLab/pipelines/tree/master/chip-seq.scif in case you're interesting in taking a closer look.
To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.

v

unread,
May 1, 2018, 7:57:23 PM5/1/18
to singu...@lbl.gov
I think you would need to remove the entire .snakemake directory, this is a different thing than the Snakefile. For my pipeline I only needed it to run once (not repeated subjects) so I'd just delete all the data that was produced for the first run and it would run again. So I would be sure to remove the .snakemake folder (if it exists) and then if the issue still arises perhaps consider making a fresh temporary directory for each (different) subject.

I need to make dinner but I can take a look tomorrow if you are having trouble still!

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

v

unread,
May 1, 2018, 8:07:13 PM5/1/18
to singu...@lbl.gov
And to debug I would try to do a sanity check to, at the start of a run, print out what snakemake knows to be the inputs, knows to be the outputs, and which of these do and don't exist. If you do that between two runs we should minimally get an understanding of its state and go from there.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

Paolo Di Tommaso

unread,
May 3, 2018, 2:27:05 AM5/3/18
to singu...@lbl.gov
You are using the wrong workflow framework 😂

p

--
Reply all
Reply to author
Forward
0 new messages