issues with juicer and 3D-DNA pipeline

581 views
Skip to first unread message

XueFei Lee

unread,
Sep 15, 2017, 1:27:03 AM9/15/17
to 3D Genomics

With the software, I can successfully reproduce the NA12878 assembly. However, when I tried to use 3d-dna to scaffold my de novo assembly, it failed(I successfully got results with LACHESIS). The species is Arabidopsis thaliana. The input files are:

Draft de novo assembly (121M) containing 720 contigs

32G hi-c reads

I ran the juicer pipeline and got a 800M(bwa aln) or a 12G(bwa mem) merged_nodups.txt file. Then I used these files as the input of the 3d-dna pipeline.

When I ran in the haploid mode, it ended up with a FINAL fasta file with one big unsplit scaffold and many small scaffolds.(the stderr shows "Chromosome boundary position not positive! Chromosome splitter failed. Refer to the hic map: continuing without splitting")

While in the diploid mode, it ended up with a FINAL fasta file containing many small scaffolds similar to the input contigs. (no error info)

juicer command:

```

juicer.sh -d /mydirpath/ -D /mydirpath/ \

 -S early -r -R 2 -z references/p_ctg.fa\

 -s HindIII -y restriction_sites/p_ctg_HindIII.txt\

 -p restriction_sites/p_ctg.chrom.sizes

```

3d-dna command:

```

sh 3d-dna/run-pipeline.sh -m haploid -t 10000 -s 2 -c 5 p_ctg.fa at.mnd.txt

sh 3d-dna/run-pipeline.sh -m diploid -t 10000 -s 2 -c 5 p_ctg.fa at.mnd.txt

```

I have noticed that the stderr of juicer pipeline shows:

'Error! The number of reads in the fastqs (46859770) is not the same as the number of reads reported in the stats (46913927), likely due to a failure during alignment.Reads don't add up.'

I checked this forum and found someone had a similar problem with juicer:

https://groups.google.com/forum/#!topic/3d-genomics/aalqyooC9u8

But my readname looks like this:

@ST-E00494:70:H35YGALXX:5:1101:9780:1309 2

I am not sure what causes this failure in juicer.

If any intermediate file is needed to diagnose this issue, please let me know.

Thank you.

Olga Dudchenko

unread,
Sep 17, 2017, 10:14:58 PM9/17/17
to 3D Genomics
Hello XueFei,

Note that the published 3d-dna pipeline aims to reproduce the results of the Zika mosquito paper and will require tuning to apply to genomes or radically different chromosome sizes and/or contigs generated with a different technology. Let me know if you would like any suggestions with respect to tuning or if you would like me to help you assemble: this would however require sharing some of the intermediate files generated by the pipeline.

With respect to the juicer problem I would suggest simply rerunning given that your library is not too big to rule out that this is, in fact, not a rogue alignment job failure as suggested by the error message. If the problem persists it would help to diagnose if you share a fragment of R1 and R2 fastq files: it seems likely that this might be some read name discrepancy, just not the same type as in the previous forum post.

Best,
Olga

XueFei Lee

unread,
Sep 20, 2017, 3:21:28 AM9/20/17
to 3D Genomics

Hi Olga,

Thank you for your reply.

Attached are head -n 100 and tail -n 100 of the fastq files. Hope this helps to solve the problem.

 

Best

Xuefei 


在 2017年9月18日星期一 UTC+8上午10:14:58,Olga Dudchenko写道:
at_R1.fastq.gz
at_R2.fastq.gz

Neva Durand

unread,
Sep 21, 2017, 9:32:30 AM9/21/17
to XueFei Lee, 3D Genomics
Olga checked the fastqs and they are fine from a read name perspective.

What is the correct number of reads in your fastqs?  As Olga suggested, it is quite possible alignment failed, as the message suggests, though it is surprising that the number of reads in the statistics is higher than the line count.

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/46e07865-10ef-4fd9-b2ee-5bd53b2efdc8%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Neva Cherniavsky Durand, Ph.D.
Staff Scientist, Aiden Lab

Michał T. Lorenc

unread,
Aug 8, 2018, 10:23:04 PM8/8/18
to 3D Genomics
Hi Olga,
We have 3.1 G allotetraploid plant genome with 19 chromomes. What kind of setting would you recommend?

Thank you in advance,

Best wishes,

Michal

Olga Dudchenko

unread,
Aug 9, 2018, 11:27:29 AM8/9/18
to 3D Genomics
Hi Michal,

I would suggest running 3D-DNA with default parameters and checking the output in Juicebox Assembly Tools: depending on how divergent the genomes are and how collapsed the draft is you may or may not have to tune at all. For faster results either simply wait for the round 0 results to come in (or run 3D-DNA with and --early-exit flag and run bash ${pipeline}/edit/run-mismatch-detector.sh and bash ${pipeline}/edit/run-coverage-analyzer.sh) to examine the output .0.hic, .0.wig and .0.bed files in Juicebox Assembly Tools.

Loading in JBAT you'll be able to tell how much are you loosing with default mapping quality threshold, is your coverage relatively uniform, how many misjoins you have and whether the misjoin detector is confused by the alignment biases. Some helpful info can be found here: http://aidenlab.org/assembly/manual_180322.pdf. Happy to give feedback if you need any help interpreting the results.

With best wishes,
Olga 
Reply all
Reply to author
Forward
0 new messages