Just a bit of background, I'm building a highly heterozygous, highly repetitive 1.6Gb mollusc genome with the help of Redundans and so far I'm very happy with the result. My initial contig assembly was performed with Abyss which I believe is pretty strict about breaking contigs whenever it encounters heterozygosity. Because of this, I started Redundans with an assembly of over 6 million contigs and an N50 of 682. With the help of Redundans this is now at just under 200,000 contigs with an N50 over 30kb and things are continuing to improve.
My question regards the identity [%] which is reported after the initial reduction step. This is what I got:
#file name genome size contigs heterozygous size [%] heterozygous contigs [%] identity [%] possible joins homozygous size [%] homozygous contigs [%]
reduced/contigs.fa 2380283501 6058590 1117121448 46.93 5173494 85.39 77.536 0 1263162053 53.07 885096 14.61
With an identity % of 77, does this mean that on average, the heterozygous contigs are only 77.536% similar to one another? I expected this genome to be highly heterozygous but that value still seems very low to me.
If this is the case, does this mean that I am going to have problems mapping genes back to my genome which are located on these highly dissimilar heterozygous regions? I can imagine that if after reduction, a gene maps to a region which only shares 77% sequence identity between alleles, it's going to be very difficult to confidently align it to the genome.