Hello,
I am carrying out some basic summary statistics on a GBS-derived SNP dataset.
I used a highly-filtered dataset initially (filtered for depth, missingness, LD, and HWE statistics) and used that SNP data to assess population structure. I recapitulated known relationships, as expected.
The same dataset was used to calculate some basic statistics about diversity and I received some wonky results. The populations were defined based upon the STRUCTURE analysis.
I went back to a dataset that was less stringently filtered to re-run some of the same basic summaries but still see similar results across all populations. What I find strange is that the signal appears to be consistent between populations (not what I expected) and that there appears to be highly-negative Fis values. If this is a result of calculating basic statistics on all populations simultaneously, I can understand why that might happen. When I try to subset populations though, it appears as though a bunch of NA values are being produced across loci, and that negative Fis values still persist.
Has anyone encountered similar issues or have any recommendations?
Thanks in advance.
Josh