Juicer on multiple fastq.gz files

Jenny

unread,

Jan 8, 2020, 7:59:58 PM1/8/20

to 3D Genomics

Hello,

I have multiple fastq.gz files from a single sample. And I want to run Juicer to combine all of the reads into one Hi-C map, without needing to merge all the fastq files into one file before running Juicer. Is this possible, and how may I organize the directories?

Thanks,

Jenny

Neva Durand

unread,

Jan 9, 2020, 8:29:45 AM1/9/20

to Jenny, 3D Genomics

Yes. Just be sure the read1 and read2 have the same stem (and have R1 and R2 in the name). And put them all in the fastq directory. Look at the documentation here for more info:

GitHub.com/aidenlab/juicer/wiki

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/e0880ca7-cd49-4de6-8c48-781dc83cf68c%40googlegroups.com.

--

Neva Cherniavsky Durand, Ph.D.

Pronouns: she, her, hers

Assistant Professor, Aiden Lab

www.aidenlab.org

Jiani Yin

unread,

Jan 9, 2020, 2:21:33 PM1/9/20

to Neva Durand, 3D Genomics

Hi Neva,

Thank you for your rapid response.

So in the fastq/ directory, I can for example have:

Batch1_R1.fastq.gz

Batch1_R2.fastq.gz

Batch2_R1.fastq.gz

Batch2_R2.fastq.gz

Batch3…

And then run juicer.sh, it will take all the fastq.gz files and generate a single map?

Thanks,

Jenny

Neva Durand

unread,

Jan 9, 2020, 3:08:11 PM1/9/20

to Jiani Yin, 3D Genomics

Yes, exactly.

Etienne Danis

unread,

May 12, 2021, 1:58:42 PM5/12/21

to 3D Genomics

Dear Neva,

Thank you very much for creating all this suite of tools for Hi-C data processing!

And thank you very much for patiently answering the questions from so many users of your tools!

Based on your previous answer (see above), the juicer.sh script will be able to process all the fastq files from replicates as long as they are located in the same directory and they have the same stem name (and have R1 and R2 in the name).

Is it the case for all juicer.sh scripts (I'm trying to decide between CPU juicer.sh script and the LSF juicer.sh script)?
Is it still necessary to use the mega.sh script to combine all the data into a unique .hic file?

When is mega.sh useful to run?

Thank you very much in advance for your help!

Best regards,

Etienne

Neva Durand

unread,

May 12, 2021, 2:01:28 PM5/12/21

to Etienne Danis, 3D Genomics

Hello,

With a lot of reads, the CPU version will take an inordinate amount of time to run. However, the LSF script is not really maintained because we don't have an LSF system with which to debug it. If you have a billion read map you will have to use LSF. For 100M, you can try CPU, but it might take a long time to run, particularly the merge sort.

The mega.sh script is for combining replicates.

Best

Neva

To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/f21c8212-360b-4104-9230-1621310001e6n%40googlegroups.com.

--

Neva Cherniavsky Durand, Ph.D. | she, her, hers

Assistant Professor | Molecular and Human Genetics

Aiden Lab | Baylor College of Medicine

www.aidenlab.org

Etienne Danis

unread,

May 12, 2021, 4:03:15 PM5/12/21

to 3D Genomics

Thank you very much, Neva for your very prompt answer!
This is very good to know. It makes sense.

Since I do not have access to an HPC using SLURM or others..., I'm wondering whether I should use AWS.

(I was initially planning to run the first part of juicer (till the .hic file) on the LSF and then run HiCCUPS on AWS)

Are the updates made to juicer.sh (for other versions than the LSF one) slightly (or drastically) improving the calling of the loops or are they just removing some minor bugs or speeding up the processing?

Could the quality of the results of my analyses be improved if I used a more recent version of juicer (better sensitivity or resolution...)?

If I correctly understood your last answer, juicer.sh can process multiple fastq files from different sequencing runs corresponding to the sample (batch1_R1.fastq.gz, batch1_R2.fastq.gz, batch2_R1.fastq.gz, batch2_R2.fastq.gz, batch3_R1.fastq.gz, batch3_R2.fastq.gz, ...) and create one .hic file with all the data combined, while mega.sh can combine hic files from different replicates of the sample.

For example, now I have 1.2 billon reads for one sample spit into 9 distinct fastq files (18 total with the R1 an R2). Juicer.sh will automatically analyze all the 9 fastq files and give me one .hic file. At this stage, I won't need to use mega.sh.

But if a bit later, I decide to get Hi-C data from the same sample but at a different cell passage, get another 1.2 billion reads split into multiple fastq files, combine the data into one .hic file, then if I want to merge this latter .hic file with the previous .hic file, then I would have to run mega.sh to combined these two .hic files.

Am I correct?

Thank you very much in advance!
Best,

Etienne

Reply all

Reply to author

Forward