How to define the repeats and keep or exclude based on HiC contact map

467 views
Skip to first unread message

amit8...@gmail.com

unread,
Jan 18, 2022, 4:26:57 AM1/18/22
to 3D Genomics
Hi,

Thank you so much for juicer, 3D pipeline and giving option to edit the contact map in order to correct the misalignment.

I started genome assembly for a highly heterozygous plant genome of genome size 420Mb using with HiFi sequence dataset (raw read N50 as 25Kb).  Using a total of 37x genome coverage and HiFiasm program, followed by pruge_dup based analysis, contig level genome assembly was achieved within 54 contigs with contig N50 as 36Mb. This suggest that the genome is good and it must have included the repeats with several chromosomes should be gapless.

I next used Arima HiC kit1, and HiC reads were mapped using juicer. I followed this with scaffolding using 3D pipeline and it is still running. I have two questions-
1. It has been over 1 week that the lastz based allignment is going. I was able to get .rawchrom.fasta. I am curious if this is normal and if I can speed up the process.

2. Since genome was well assembled, I checked the assembly.0.hic file using juicerbox. Contact map looked good overall, but I am confused on two sites. 
Overall HiC contact map looks like shown below-
HiC_overall_contact_map.PNG

The plant that I am working on has eight chromosomes, and I could see these eight chromosomes, and the rest seems repeats that were not aligned or seem to be with any pseudo molecule. Here, if we see chromosome 2 (number 2 from top), you will be able to see as If there is a strange shape, and I am not sure if I need to correct this one although I feel that it could be repeat. Looking at higher resolution, you can see two blocks-
HiC_overall_contact_map_chromosome 2-1.png 

If I removed these that are shown through green lines, assembly looks like this-
HiC_overall_contact_map_overall_chromosome 2_removal_of_repeats.PNG

The black line just shows the section that got excluded. 

I am just worried if I am trying to artificially exclude repeats which should be there. Since this assembly was generated using HiFi datasets and I used stringent conditions, is it good to accept these structures as real or shall i exclude them.

I am using jucierbox ver 1.11.08, and 3D pipeline were downloaded on 18th July 2021. My server has 1Tb ram, 128 cores. 

Kindly let me know if you need any information in order to help me out. Thank you so much in advance,

with best regards
Amit

Amit Rai

unread,
Jan 20, 2022, 12:12:32 AM1/20/22
to 3D Genomics
Hi,

I am so sorry for asking again, and I do understand how busy you guys are while helping so many people.

I was really curious if the issue that I asked help for can be resolved.

thank you so much,

with best regards
Amit

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/476661a2-c330-4d37-868a-cdcc6a4697dfn%40googlegroups.com.

Olga Dudchenko

unread,
Feb 8, 2022, 7:57:18 PM2/8/22
to 3D Genomics
Hey Amit,

Sorry for the delay in answering! A few things.

1)  you mention Lastz. This suggests that you are running 3d-dna in diploid mode. It is unlikely that you need to do this given that you attempted to remove alt haplotypes already using purge haplotigs.

2) I can see from the images that you share that your coverage varies greatly across the genome. This is not unusual for plant libraries, but it might mean that you would need to change the default parameters to 3d-dna. It looks like you have empirically done that by using the results from early exit, i.e. 0.hic and .0.assembly.

3) weird patterns look like centromeric repeats. I cannot judge from static image if they seem to be all correctly assigned, but I also do not see anything that would say they are definitely wrong. The interactions with other chromosomes you are seeing are highlighted, if I would guess, in the balanced mode (your screenshot does not show normalization). If there is some sequence similarity between centromeres of different chromosomes (likely), that would cause the centromere-to-centromere interactions overhighlight.

Hope this helps,
Olga

Reply all
Reply to author
Forward
0 new messages