How to interpret "identity [%]"

97 views
Skip to first unread message

Andrew Calcino

unread,
Feb 22, 2017, 11:31:10 AM2/22/17
to Redundans
Just a bit of background, I'm building a highly heterozygous, highly repetitive 1.6Gb mollusc genome with the help of Redundans and so far I'm very happy with the result. My initial contig assembly was performed with Abyss which I believe is pretty strict about breaking contigs whenever it encounters heterozygosity. Because of this, I started Redundans with an assembly of over 6 million contigs and an N50 of 682. With the help of Redundans this is now at just under 200,000 contigs with an N50 over 30kb and things are continuing to improve.

My question regards the identity [%] which is reported after the initial reduction step. This is what I got:

#file name    genome size    contigs    heterozygous size    [%]    heterozygous contigs    [%]    identity [%]    possible joins    homozygous size    [%]    homozygous contigs    [%]
reduced/contigs.fa    2380283501    6058590    1117121448    46.93    5173494    85.39    77.536    0    1263162053    53.07    885096    14.61

With an identity % of 77, does this mean that on average, the heterozygous contigs are only 77.536% similar to one another? I expected this genome to be highly heterozygous but that value still seems very low to me.

If this is the case, does this mean that I am going to have problems mapping genes back to my genome which are located on these highly dissimilar heterozygous regions? I can imagine that if after reduction, a gene maps to a region which only shares 77% sequence identity between alleles, it's going to be very difficult to confidently align it to the genome.

l.p.p...@gmail.com

unread,
Feb 22, 2017, 11:48:59 AM2/22/17
to Andrew Calcino, Redundans
Hi Andrew, 

As you said, 77% is quite low, most likely many quite ancient duplications and repeats were removed. To estimate which identity cut-off will be best for you, have a look at identity histogram in redundans_out/contigs.reduced.fa.hist.png

Here you have some more info about the histogram: https://github.com/lpryszcz/redundans/tree/master/test#reduction

Bests, 
L.

--
You received this message because you are subscribed to the Google Groups "Redundans" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redundans+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/redundans/33c96df7-ff57-4ded-bbf5-45075fc93850%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages