Juicer software suite performance

Johnson Zhang

unread,

Nov 14, 2020, 11:12:31 PM11/14/20

to 3D Genomics

Hello, Neva,

Thanks for your development of Juicer, and it's been an awesome tool for processing HiC data!

While I am using this for my project, I encountered some minor issues. Although these have not prevented generating results, I would like to get some inputs from you:

1. The Juicer.sh pipeline seems to generate large sam files while aligning the reads. This caused some 'out of disk space' errors for samples I was working on, which yielded 400M reads. The sam files stretched up to 400-500GB, and the total size of the work directory exceeded 1TB. Has the Juicer team been considering piping the series of commands, or using binary files to ease the I/O burden?

2. Current Juicer pipeline deployed data processing functions in several sh/awk/perl/Java jar files, usually found in the 'common' folder. Are there any virtual environment, such as Conda, Docker, or Singularity instances available? Since, nowadays, reproducibility is one of major issues, having a version control could help to address whether the results are reproducible.

3. The current JuicerTools works with CUDA 8.0 only, while I have CUDA 11.1 on my machine, JuicerTool failed to detect GPU. How do I leverage the GPU resources then, instead of using HiCCUPS CPU? I admit my Java programming sucks, and I found it a little bit tricky in re-compiling jar files by including the newest version of JCuda.jar. Any clues in how this can be done?

I really appreciate your commitment in maintaining this software. Thank you so much in advance!

Kind regards,

Johnson

Neva Durand

unread,

Nov 16, 2020, 12:58:55 PM11/16/20

to Johnson Zhang, 3D Genomics

Hi Johnson,

Thanks for the kind words!

We are working on an update to Juicer to do all of this. It should be ready fairly soon. We'll be releasing it through ENCODE, and an early version is here (this will change soon, though, there are some bugs): https://github.com/ENCODE-DCC/hic-pipeline

There is also a Docker here: https://github.com/theaidenlab/Juicer-Docker

A member of our lab was also working on a Singularity version.

Re: GPU, I will have one of my colleagues respond.

Best

Neva

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/574a9a93-f46e-4cfc-b3d9-3882ffcceefdo%40googlegroups.com.

--

Neva Cherniavsky Durand, Ph.D. | she, her, hers

Assistant Professor | Molecular and Human Genetics

Aiden Lab | Baylor College of Medicine

www.aidenlab.org

Muhammad Saad Shamim

unread,

Nov 16, 2020, 1:05:42 PM11/16/20

to Neva Durand, Johnson Zhang, 3D Genomics

Hi Johnson,

In most cases, CUDA has been pretty good about remaining backward-compatible, so you should presumably be able to use CUDA 8 (or even the earlier 7/7.5 versions) if the machine has CUDA 11, unless the backward compatibility has been removed.

Do you have a dedicated GPU? In order to test detection of the GPU, we have a quick script here:

https://github.com/sa501428/GPUTest

If you don't have a GPU, using the CPU version, or utilizing a GPU on a university cluster/AWS instance is possible. The GPU is only needed for HiCCUPS and can be run after a hic file is built.

Best,

- Muhammad Saad Shamim

To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/CAF1CciVweh2EagwHN5NLo7mZfYH3aEjnCOAdfjnMDWCUv94Yew%40mail.gmail.com.

Reply all

Reply to author

Forward