Min coverage needed for assembling highly heterozygous genomes

75 views
Skip to first unread message

mht

unread,
May 10, 2017, 7:29:13 AM5/10/17
to Redundans
Hi,

I am trying to assemble an invertebrate (mollusc) genome and it is known to be highly heterozygous. From what I understand, assembly of a highly heterozygous genome should result in a larger than estimated genome size (whereby Redundans can work on reducing the redundancy). However, in my case, my resulting assembly is a lot smaller than the expected size with only 40% of total reads mapping back to the assembly.

From my total Illumina reads, I expect to have 50x coverage of the genome - and this coverage is further lowered according to heterozygous (1C peak ~ 13x) and homozygous (2C peak ~28x) regions. Based on everyone's experience with polymorphic genomes, is it possible that a lack of coverage at the heterozygous regions is the reason for the incomplete smaller assembly size?

Genome size is expected to be 1.5Gb and I have tried both Platanus and MaSuRCA to assemble the genome with my PE library reads.

Appreciate any kind of input!

Richards, Stephen

unread,
May 10, 2017, 11:55:49 AM5/10/17
to mht, Redundans
Did the assembly size drop during the redundans process or was it too small from the earliest steps?
I have had issues where the intermediate steps failed (due to cluster issues not software issues) but the run continued, 
not realizing it only had half an assembly. Checking the size all the way along and re-running helped.

Its also possible that a lot of repeats were collapsed - but it’s hard to tell if that is the correct answer.
fringy

On May 10, 2017, at 6:29 AM, mht <mh.t...@gmail.com> wrote:

***CAUTION:*** This email is not from a BCM Source. Only click links or open attachments you know are safe.
--
You received this message because you are subscribed to the Google Groups "Redundans" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redundans+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/redundans/4816bb03-a977-4486-9cb2-b66e7a4cf2f1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

l.p.p...@gmail.com

unread,
May 12, 2017, 11:31:33 AM5/12/17
to Richards, Stephen, mht, Redundans
Hi, 

Are you talking about initial assembly (from platanus) or assembly resulting from Redundans? 
The fact that you are able to align only 40% of reads is odd. Check unmapped reads: are they low quality? 

What is your library exactly (read length, insert size)? 

50X is rather low for polymorphic genome. If you really see only 13x in heterozygous (1C) regions, then it could explain why your assembly is incomplete. If it's really 50X you should see ~25x in heterozygous regions and 50x in heterozygous. Maybe you underestimated your genome size prior to sequencing? 

If I'm correct, Platanus is already doing some assembly reduction (check -u parameter). I don't know about MaSuRCA. So as fingy said, your assembly will be much smaller if it's repeat rich and those repeats are removed by platanus. 

Do you have some mate-pairs? If so, you should recover at least some of the missing parts (repeats) during scaffolding. 

Bests,
L. 



L.

2017-05-10 17:55 GMT+02:00 Richards, Stephen <step...@bcm.edu>:
Did the assembly size drop during the redundans process or was it too small from the earliest steps?
I have had issues where the intermediate steps failed (due to cluster issues not software issues) but the run continued, 
not realizing it only had half an assembly. Checking the size all the way along and re-running helped.

Its also possible that a lot of repeats were collapsed - but it’s hard to tell if that is the correct answer.
fringy

On May 10, 2017, at 6:29 AM, mht <mh.t...@gmail.com> wrote:

***CAUTION:*** This email is not from a BCM Source. Only click links or open attachments you know are safe.
Hi,

I am trying to assemble an invertebrate (mollusc) genome and it is known to be highly heterozygous. From what I understand, assembly of a highly heterozygous genome should result in a larger than estimated genome size (whereby Redundans can work on reducing the redundancy). However, in my case, my resulting assembly is a lot smaller than the expected size with only 40% of total reads mapping back to the assembly.

From my total Illumina reads, I expect to have 50x coverage of the genome - and this coverage is further lowered according to heterozygous (1C peak ~ 13x) and homozygous (2C peak ~28x) regions. Based on everyone's experience with polymorphic genomes, is it possible that a lack of coverage at the heterozygous regions is the reason for the incomplete smaller assembly size?

Genome size is expected to be 1.5Gb and I have tried both Platanus and MaSuRCA to assemble the genome with my PE library reads.

Appreciate any kind of input!


--
You received this message because you are subscribed to the Google Groups "Redundans" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redundans+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Redundans" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redundans+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/redundans/6EC0093D-DA28-447D-94E8-548D0321535F%40bcm.edu.
Reply all
Reply to author
Forward
0 new messages