How to interpret "identity [%]"

97 views

Skip to first unread message

Andrew Calcino

unread,

Feb 22, 2017, 11:31:10 AM2/22/17

to Redundans

Just a bit of background, I'm building a highly heterozygous, highly repetitive 1.6Gb mollusc genome with the help of Redundans and so far I'm very happy with the result. My initial contig assembly was performed with Abyss which I believe is pretty strict about breaking contigs whenever it encounters heterozygosity. Because of this, I started Redundans with an assembly of over 6 million contigs and an N50 of 682. With the help of Redundans this is now at just under 200,000 contigs with an N50 over 30kb and things are continuing to improve.

My question regards the identity [%] which is reported after the initial reduction step. This is what I got:

#file name genome size contigs heterozygous size [%] heterozygous contigs [%] identity [%] possible joins homozygous size [%] homozygous contigs [%]
reduced/contigs.fa 2380283501 6058590 1117121448 46.93 5173494 85.39 77.536 0 1263162053 53.07 885096 14.61

With an identity % of 77, does this mean that on average, the heterozygous contigs are only 77.536% similar to one another? I expected this genome to be highly heterozygous but that value still seems very low to me.

If this is the case, does this mean that I am going to have problems mapping genes back to my genome which are located on these highly dissimilar heterozygous regions? I can imagine that if after reduction, a gene maps to a region which only shares 77% sequence identity between alleles, it's going to be very difficult to confidently align it to the genome.

l.p.p...@gmail.com

unread,

Feb 22, 2017, 11:48:59 AM2/22/17

to Andrew Calcino, Redundans

Hi Andrew,

As you said, 77% is quite low, most likely many quite ancient duplications and repeats were removed. To estimate which identity cut-off will be best for you, have a look at identity histogram in redundans_out/contigs.reduced.fa.hist.png.

Here you have some more info about the histogram: https://github.com/lpryszcz/redundans/tree/master/test#reduction

Bests,

--
You received this message because you are subscribed to the Google Groups "Redundans" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redundans+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/redundans/33c96df7-ff57-4ded-bbf5-45075fc93850%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages