More scaffolds and fragmented after 3d-DNA

229 views
Skip to first unread message

Xiaomeng

unread,
Mar 19, 2023, 6:59:03 PM3/19/23
to 3D Genomics
Hi,

I am assembling a plant genome (2N=2*19, GS=330Mb), the draft genome is quite complete and at a pseudo-chromosome level with 130 scaffolds. The output of juicer shows similar numbers of scaffolds as the draft but after running 3d-dna using default, it splitted into 7144 scaffolds in the resolved.assembly, 1444 scaffolds in FINAL.assembly but no final.HiC. Only 0.assembly has the same scaffold number but it is before the any processing like misjoin and polish, right?

When I look back to the statistics showing in the juicer version, only 62% paired reads are used then there are only 39M Hi-C contacts. I am not sure if it is too low for the default program. I have two questions:

1. Do you have any idea why 3ddna will produce more scaffolds than before? Is it proper enough to just use 0.hic version? (I am also quite confused by multiple output file, I found the manual less informative, is there detailed infos?)
2. How many Hi-C contact reads would you suggest as a good HiC dataset? Is the low ratio in my data because of the high repeat ratio of this species?


Best wishes,
Xiaomeng


juicer-stats.png3ddna-0.hic.png3ddna-2.hic.pngjuicer-inter30.png

Olga Dudchenko

unread,
Mar 20, 2023, 12:39:48 PM3/20/23
to 3D Genomics
Hi Xiaomeng,

Mostly what you see is caused by very uneven coverage. You might want to examine if those are expected (or perhpas due to under collapsed heterozygosity that you need to address). Overall your probably best path forward is to use the .0.hic (early exit output) and look through the results in JBAT given that you are pretty close to home and you do not appear to need a whole lot of misjoin error correction. If you feel you need to do error-correction you can push the code to ignore the coverage issue by passing something like --editor-repeat-coverage 10.

Olga

Reply all
Reply to author
Forward
0 new messages