Parallelization parameters for juicer_tools pre

377 views
Skip to first unread message

Andrea Garavito

unread,
Feb 2, 2021, 7:56:10 AM2/2/21
to 3D Genomics
Good day, 
I'm running the juicer_tools pre (v1.22.01) from the juicier.sh script (v1.6), on a cluster.
The script has finished the creation of the merged_nodups.txt and it is currently creating my .hic files. The thing is that it's been more than a week since it started to write the .hic files.
I didn't realize that the juicer.sh doesn't parallelize this step even if the  juicer_tools pre has the capabilities to do it, until I noticed how long is taking this step.
In consequence, I'll like to re-run the juicer_tools pre  with parallelization, but I'm nor sure about the right parameters to use to optimize it.

What would be the suitable numbers for the (1) -j (number of CPU threads to use), (2) --threads (number of threads), and the (3) free memory  and (4) virtual memory needed for each CPU used, taking into account that my merged_nodups.txt is 241 Gbytes, and I have a genome assembly of about 2.4 Gb, and that I'm using a cluster with good capabilities.

Thank you for your comments
Andrea



Neva Durand

unread,
Feb 2, 2021, 11:32:16 AM2/2/21
to Andrea Garavito, 3D Genomics
Hi Andrea,

It is unusual for Pre to take this long, are you by chance trying to create a hic file from an assembly with many contigs? Note that to use 3D-DNA, you should not create a hic file in this way.

Best
Neva

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/72c6430a-dc16-4f70-8d7e-ea1b5417ecddo%40googlegroups.com.


--
Neva Cherniavsky Durand, Ph.D. | she, her, hers
Assistant Professor |  Molecular and Human Genetics
Aiden Lab | Baylor College of Medicine

Andrea Garavito

unread,
Feb 3, 2021, 9:28:28 AM2/3/21
to Neva Durand, 3d-ge...@googlegroups.com
Hi again Neva.
I was checking the run-asm-pipeline.sh pipeline but I'm confused about how to handle its parallelisation when sent to a job manager such as Slurm.
I really don't see how to set/match the number of parallel jobs that the commands on the script create with the GNU parallel dependency, to the number of allocated CPUs on the cluster.
Do you have a suggestion to run it as a Slurm or SGE job?

Thank you in advance.
Andrea

El mar, 2 feb 2021 a las 18:29, Andrea Garavito (<nea...@gmail.com>) escribió:
Thank you Neva, I'll do as suggested.
Best
Andrea

El mar, 2 feb 2021 a las 18:25, Neva Durand (<ne...@broadinstitute.org>) escribió:
Yes, exactly. As it is, you can kill your job and proceed with the Cookbook since you have your merged_nodups.

On Tue, Feb 2, 2021 at 12:17 PM Andrea Garavito <nea...@gmail.com> wrote:
Thank you Neva for your comment,
Yes, I have an assembly with 28735 contigs.
I ran the juicer.sh script as explained in the chapter 3 of the Genome Assembly Cookbook. Should I have used the "early exit" flag and then treat the obtained merged_nodups.txt, with the  3D-DNA pipeline (run-asm-pipeline.sh)?
Best
Andrea
Reply all
Reply to author
Forward
0 new messages