Parallelization parameters for juicer

Andrea Garavito

unread,

Feb 2, 2021, 7:56:10 AM2/2/21

to 3D Genomics

Good day,

I'm running the juicer_tools pre (v1.22.01) from the juicier.sh script (v1.6), on a cluster.

The script has finished the creation of the merged_nodups.txt and it is currently creating my .hic files. The thing is that it's been more than a week since it started to write the .hic files.

I didn't realize that the juicer.sh doesn't parallelize this step even if the juicer_tools pre has the capabilities to do it, until I noticed how long is taking this step.

In consequence, I'll like to re-run the juicer_tools pre with parallelization, but I'm nor sure about the right parameters to use to optimize it.

What would be the suitable numbers for the (1) -j (number of CPU threads to use), (2) --threads (number of threads), and the (3) free memory and (4) virtual memory needed for each CPU used, taking into account that my merged_nodups.txt is 241 Gbytes, and I have a genome assembly of about 2.4 Gb, and that I'm using a cluster with good capabilities.

Thank you for your comments

Andrea

Neva Durand

unread,

Feb 2, 2021, 11:32:16 AM2/2/21

to Andrea Garavito, 3D Genomics

Hi Andrea,

It is unusual for Pre to take this long, are you by chance trying to create a hic file from an assembly with many contigs? Note that to use 3D-DNA, you should not create a hic file in this way.

Best

Neva

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/72c6430a-dc16-4f70-8d7e-ea1b5417ecddo%40googlegroups.com.

--

Neva Cherniavsky Durand, Ph.D. | she, her, hers

Assistant Professor | Molecular and Human Genetics

Aiden Lab | Baylor College of Medicine

www.aidenlab.org

Andrea Garavito

unread,

Feb 3, 2021, 9:28:28 AM2/3/21

to Neva Durand, 3d-ge...@googlegroups.com

Hi again Neva.

I was checking the run-asm-pipeline.sh pipeline but I'm confused about how to handle its parallelisation when sent to a job manager such as Slurm.

I really don't see how to set/match the number of parallel jobs that the commands on the script create with the GNU parallel dependency, to the number of allocated CPUs on the cluster.

Do you have a suggestion to run it as a Slurm or SGE job?

Thank you in advance.

Andrea

El mar, 2 feb 2021 a las 18:29, Andrea Garavito (<nea...@gmail.com>) escribió:

Thank you Neva, I'll do as suggested.
Best
Andrea

El mar, 2 feb 2021 a las 18:25, Neva Durand (<ne...@broadinstitute.org>) escribió:
Yes, exactly. As it is, you can kill your job and proceed with the Cookbook since you have your merged_nodups.

On Tue, Feb 2, 2021 at 12:17 PM Andrea Garavito <nea...@gmail.com> wrote:
Thank you Neva for your comment,
Yes, I have an assembly with 28735 contigs.
I ran the juicer.sh script as explained in the chapter 3 of the Genome Assembly Cookbook. Should I have used the "early exit" flag and then treat the obtained merged_nodups.txt, with the 3D-DNA pipeline (run-asm-pipeline.sh)?
Best
Andrea

Reply all

Reply to author

Forward

Parallelization parameters for juicer_tools pre

Andrea Garavito

Neva Durand

Andrea Garavito