Hi All,
I noticed a quite strange behaviors when calculating R2, both with Plink1.9 and Plink2.0:
If at one sites, all individuals are heterozygous (being therefore 0/1 or 1/0), the r2 between this site and all others site is not reported.
A test dataset is attached (Test1.vcf). In this dataset:
- The SNP 1 and 2 have the same genotype, but with the alternate allele on different strand.
- The SNP 2 and 3 have the same genotype except that the first individual is 0/0 instead of 0/1.
- The SNP 3 and 4 have the exact same genotype.
- All individuals are heterozygous at SNP1 and 2
- All but one individual are heterozygous for SNP 3 and 4
The command I used:
plink1.9 --vcf Test.vcf --maf 0.01 --r2 --out TestData
plink2 --vcf Test.vcf --maf 0.01 --r2-unphased --out TestData2
The result of plink1.9 is attached. Only the ld between 3 and 4 is reported by plink, whereas all sites are in very high LD.
Is there something important that I am missing or is there really a bug somewhere ?
Thank you very much,
Paul