Hi Julian et al.,
We are working with a species with lots of variation. There are many rare SNPs, and there are usually >1 SNP per RAD locus. For some of our analyses we would like to have a Genepop file filtered so we only have SNPs with MAF of, say > 0.1. Is there a smart way to achieve this? In populations we can choose to exclude SNPs below a certain MAF, but this only affects the Fst output, not the Genepop file. Of course, the Fst file could be used to create a whitelist that could subsequently be used to cut out the SNPs of interest from the Genepop file, but this seems tedious. Would it be possible in populations to add a filter to export only certain SNPs to the Genepop file or is there an alternative good approach?
Any help would be much appreciated.
Best regards,
Michael
Hi Julian,
Thanks for your excellent explanation and suggestions! I think option 1 would be preferable. The reason why I asked the question is that we want to make input files for BayeScan and similar methods and do not want a lot of SNPs that are rare across all populations. So, the criterion should be that MAF for the SNP should be < say 0.1 in all populations before it is excluded; if it is < 0.1 in all but one population and > 0.1 in a single population it might have a biological significance and it should be kept. To answer your question: for our purpose the SNP should be omitted from all samples.
Your suggestion to make a blacklist is very good. However, the particular species that we work with has an awful lot of variation, and most RAD loci have > 1 SNP. If I understand you correctly, we would exclude the whole RAD locus if it has a SNP below a certain MAF, even if there is another SNP with MAF above the threshold. In our specific case this would mean that we lose a lot of data. So, ideally we would want to omit a SNP with MAF below a certain threshold at a locus, but at the same time keep another SNP at the same RAD locus if MAF is above the threshold.
Best regards,
Michael
Hi Julian,
Thanks a lot! It will make life much easier for us when
this is implemented. I am sure other people working with species showing very
high levels of variation will also appreciate this. I hope it will not be too
technically complicated.
The species we work with presumably has a very high Ne, and that underlies the high amount of variation. So, in this case there could be both SNPs with relatively high MAF and others with very low MAF at the same RAD locus; due to the high Ne a lot of new mutations are retained within the population. We also have RAD data from other organisms, where there are more “normal” levels of variation, so we do not think the high level of variation we see in this particular species is due to technical artifacts (and we have of course done our best to filter for quality).
Best regards,
Michael
an email to stacks-users+unsubscribe@googlegroups.com
<mailto:stacks-users+unsub...@googlegroups.com>.
--
Julian M Catchen, Ph.D.
Institute of Ecology and Evolution
University of Oregon
--
jcat...@uoregon.edu
http://www.uoregon.edu/~jcatchen/