0.hic looks better than final.hic?

94 views
Skip to first unread message

Ryan Martinez

unread,
Jul 22, 2024, 8:20:24 PMJul 22
to 3D Genomics
Hey all,

Thanks so much for your active support on these tools!

I've read the cookbook and the Dudchenko et al. 2017 supplemental, but I'm still confused with my hic maps and having some issues with loading annotations in JBAT.

I'm working with a plant genome that is approximately 850Mb. Estimated to have 10 chromosomes. I assembled using 50x PacBio HiFi reads and ~11x coverage of Nanopore UL reads (>50kb) using Hifiasm with the ultra long flag. My draft genome is 95 scaffolds, but 95% of it is contained in the first 28. Based on my merged_nodups.txt file, it looks like I only have about 4x coverage, which makes sense because the core that prepared my samples had a tough time getting DNA out of my leaves when doing Omni-C, so I know my coverage isn't ideal.

I've attached my 0.hic and final.hic maps for the default contig cutoff + editor repeat coverage 2 and another for -i 100,000 + editor repeat coverage 5.
  • When I visualize my 0.hic map it looks like I can see 10 superscaffolds in my data, but when I load the final.hic map the assembly is way more fragmented. After 0.hic everything starts to look a lot more like the final
    • I basically see the same thing when I run the assembly with either cutoff and editor repeat coverage
  • Another issue I'm having is when I try to load my final.hic and final.assembly files into JBAT,  none of the annotations appear on the map. I'm also unable to make any edits in my map, so I'm not really sure what to do from here.
The next parameter I'm going to mess with is editor coarse resolution, because it seems like there may be a coverage issue?

Thanks so much for your help!

Final hic -i 100000 --editor-repeat-coverage 5
final_hic_i100000_erc5.png
0 hic -i 100000 --editor-repeat-coverage 5
0_hic_i100000_erc5.png
Final hic -i 15000 --editor-repeat-coverage 2
final_hic_i15000_erc2.png
0 hic -i 15000 --editor-repeat-coverage 2
final_hic_i15000_erc2.png

Ryan Martinez

unread,
Jul 23, 2024, 7:50:12 PMJul 23
to 3D Genomics
Loaded the .wig file as well. Also realized the last 0.hic is the same image as the previous, but ignore that. It looks like the original 0.hic, as expected.

I ordered an additional sequencing run to increase coverage to at least 7x in the meantime.
Screen Shot 2024-07-23 at 3.18.35 PM.png

Olga Dudchenko

unread,
Jul 24, 2024, 10:52:50 AMJul 24
to 3D Genomics
I'm confused as to what is the question. I follow and support your overall logic. It seems to me that you have relatively shallow dataset, so you might want to confine yourself to working at low resolutions on account of that. But overall you are getting where you need to be, as far as I can tell.

Best,
Olga

Ryan Martinez

unread,
Oct 8, 2024, 7:21:53 PMOct 8
to 3D Genomics
Hey Olga,

Sorry, I was initially confused because when I would try to load my assembly files into JBAT, none of the tracks for editing, chromosomes, or scaffolds would load. I've since figured out that if you try to load a different assembly file into a JBAT session where you have already loaded one, the tracks don't load. This made me think there was something super wrong with my data, anyway it's all good now.

I didn't really understand what 3D-DNA and I was wondering why the early hic files in the pipeline looked better than the final output. After reading the supplementary stuff and getting some higher coverage data, I get why that was.

I've since moved forward with a 2.hic file from a run with some higher coverage data. I would just like your opinion on which assembly I should move forward with.

This assembly is around 840Mb and I can clearly see the 10 chromosomes. The length of all the chromosomes minus debris is roughly what I expect the actual genome size to be. I notice there are some little crosses with low coverage in the superscaffolds, and I'm not really sure if I should just assign those regions as debris or not. What are your thoughts? In chromosome 1 for example. there is a pretty big gap in coverage in the middle, but I can also see that this region does have more interactions with chromosome 1 than any other part of the assembly.

Screen Shot 2024-10-01 at 2.19.47 PM.png

I also have a review assembly file where I went through assigned all of these little gaps as debris, but it did shrink my assembly to about 750Mb. Would you consider this assembly better despite the large drop in sequences no longer placed into chromosomes?

Screen Shot 2024-10-08 at 5.14.47 PM.pngScreen Shot 2024-10-08 at 5.14.30 PM.png

Thanks for any input, and thanks for these tools! They're fantastic

Ryan

Ryan Martinez

unread,
Oct 9, 2024, 3:59:16 AMOct 9
to 3D Genomics
Just finished running BUSCO on the two assemblies as well. I do feel like a 2.3% reduction in the score could be acceptable. The assembly went from 848Mb to 748Mb, so I'm not surprised that some BUSCOs were lost with such a large decrease in assembly size, even if these are mostly repetitive sequences. Hopefully this can better inform any recommendations. I know this is kind of beyond 3D-DNA and more into what I'm trying to get out of my project, but I would appreciate any expertise on which assembly I should move forward with. If it helps, I eventually plant to do some comparative genomics followed by nutrient stress tests + RNA-seq. which just makes it hard to know if I should accept a drop in complete BUSCOs or not.

larger assembly: 
C:97.0%[S:90.6%,D:6.4%],F:0.7%,M:2.3%,n:1614     
1565 Complete BUSCOs (C)     
1462 Complete and single-copy BUSCOs (S)     
103 Complete and duplicated BUSCOs (D)     
11 Fragmented BUSCOs (F)     
38 Missing BUSCOs (M)     
1614 Total BUSCO groups searched t

smaller assembly
C:94.7%[S:89.1%,D:5.6%],F:1.0%,M:4.3%,n:1614     
1529 Complete BUSCOs (C)     
1438 Complete and single-copy BUSCOs (S)     
91 Complete and duplicated BUSCOs (D)     
16 Fragmented BUSCOs (F)     
69 Missing BUSCOs (M)     
1614 Total BUSCO groups searched

Thanks so much!
Reply all
Reply to author
Forward
0 new messages