Hi all,
I hope you all are doing well.
I am running pixy (https://doi.org/10.1111/1755-0998.13326) on vcf file from ddRAD data that includes both variants and invariants ( enabling the flage --vcf-all in population module in stacks, https://catchenlab.life.illinois.edu/stacks/comp/populations.php). Surprisingly, I got exactly the same dxy values when using variant sites only (by enabling the flag "--bypass_invariant_check yes") as well as when using both variants and non variants. The only difference is the values of the count_missing column (larger for all sites than for variants only) , but this doesn't take a part in the plotting of dxy which only depends on the column that includes avg_dxy values.
I found this reply from Julian to a colleageus who relatively had a similar issue.
"Stacks doesn’t use the
reference genome for any inferences beyond alignment location; that is,
genotype/haplotype data can only come from the samples in the analysis.
If data are missing from an individual, they are excluded from the Dxy
calculation"
If this is the case, it is expected to get a different values of dxy in the absence/presence of invariants, because invariants is a crucial factor in dxy calculation in pixy.
Commands:
SNPS only
pixy --stats pi fst dxy
--vcf populations.snps_autosomes38_NO_200kb.vcf.gz
--populations foxes_popmap95_grouped_as_WG.txt
--bypass_invariant_check yes
--window_size 20000
All Sites
pixy --stats pi fst dxy
--vcf populations.all_autosomes38_NO_200kb.vcf.gz
--populations foxes_popmap95_grouped_as_WG.txt
--window_size 20000
I have attached a screenshots for vcf files showing the layout of 1) all sites.vcf, 2) snps only.vcf, in addition to .xlsx files of 3) pixy_dxy for all sites and 4) pixy_dxy for snps only.
Any input on that will be of high appreciation before investing more extra time on that.
It could be that this is the actual biological signal for this kind of data, because as you know ddRAD is just a subset across the genome. I'm just worry if this is a techenical issue with input data from stacks or pixy pipelines which can result in understimation of dxy.
I understand that this issue is more related to pixy pipeline than to stacks, but I hope that some inputs from here could help. I also posted this in pixy group.
Thanks
Ali