Hi,
I am using SOAPdenovo2 for the assembly of a nematode genome that seems to contain high heterozygosity, and I would like to understand better how SOAPdenovo2 deals with heterozygosity in the assembly.
We estimate roughly that divergence between homologous regions can be as high as 70% in some intergenic regions.
I understand from reading the SOAPdenovo articles that heterozygosity can be resolved at two levels: the contig assembly by merging bubbles in the deBruijn graph, and the scaffold level by favoring contigs with higher depth when there is an heterozygous contig pair competing for a position in a scaffold.
I cannot find details on how the algorithm exactly acts in these two situations: what thresholds are used? how are heterozygous contigs defined?
Also, what are the options and parameters we can play with to try to reduce the heterozygosity in the assembly and get as close as possible to a haploid genome? (mergeLevel? minContigCvg? bubbleCoverage?)
Thanks in advance for your help.
Helene