Hi,
Could someone elaborate on how the information about kmer frequency that is in the ".kmerFreq" file is gathered during the pregraph phase of SOAPdenovo?
I would like to interpret an unexpected result that came out of a pregraph run I did using 20x data (genome size is a rough estimate) and a kmer size of 17 (-K 17). See attached image for the plot; as you can see the peak is at 6x or 7x which is suggesting a larger genome size. But, could there be another explanation? An issue with ploidy or heterozygosity? I would appreciate any thoughts on this matter.
Best,
Radhika