Dear All,
We are trying to work out a specific problem I was hoping for your advice.
We are looking at natural variation in dauer formation in C. elegans.
We have genome sequence data for 20 recently wild isolates of C. elegans. The sequencing was performed with genetically homogenous samples that should be homozygote at all loci. When I refer to SNPs I am referring to differences between these 20 strains and the reference genome.
QTL mapping has also been performed by previous members of my lab which has identified an interesting region on the second chromosome that appears to play a role in the natural variation between strains in terms of dauer formation.
The mapped region is ~200,000 bp and contains 76 genes. We think that there is unusually high genetic diversity in this region, especially in terms of non-synonymous SNPs in coding regions. We suspect there are more SNPs than average, but also that there is a high diversity of alleles amongst the strains. How would we test if this is the case?
Would you simply compare the number of SNPs per coding base pair vs the average for the whole genome? Equally you could compare the number of alleles present for these 76 genes vs the average number of alleles per gene? Would it be best to attempt to find comparable region(s) (in terms of physical chromosome location and gene content) and then compare to those?
Do you know of any software to carry out this specific task?
We have annotated and filtered our VCFs using snpEff. We also have consensus genome sequences in FASTA format, as well as crude VCF files and the original sorted.bam files.
Best wishes
Barney