Hi,
I am new in using Plink, I am using Plink for Hardy-weinberg equilibrium test to filter-out variants with excessive heterozygous sites in my WGS data. I am starting analysis by chr-by-ch vcf files,
My sample command is
./plink --vcf VQSR_PHASE2_snp99.5-interval-chr2.vcf --hardy midp --out sohail
It outputs something like this:
CHR SNP TEST A1 A2 GENO O(HET) E(HET) P
2 rs117922525 ALL(NP) C T 0/1/25 0.03846 0.03772 0.5
2 rs57072359 ALL(NP) A G 0/1/25 0.03846 0.03772 0.5
2 rs147294326 ALL(NP) A G 0/1/25 0.03846 0.03772 0.5
2 rs10205197 ALL(NP) C G 2/7/17 0.2692 0.3336 0.1686
2 . ALL(NP) G C 0/1/25 0.03846 0.03772 0.5
and when i tried using:
./plink --vcf VQSR_PHASE2_snp99.5-interval-chr2.vcf --hardy --hwe 0.05 midp --out sohail2
The variants with P-value < 0.05 were not filtered out. Instead their P-values were changed, something like that..
CHR SNP TEST A1 A2 GENO O(HET) E(HET) P
2 rs117922525 ALL(NP) C T 0/1/25 0.03846 0.03772 1
2 rs57072359 ALL(NP) A G 0/1/25 0.03846 0.03772 1
2 rs147294326 ALL(NP) A G 0/1/25 0.03846 0.03772 1
2 rs10205197 ALL(NP) C G 2/7/17 0.2692 0.3336 0.2878
2 . ALL(NP) G C 0/1/25 0.03846 0.03772 1
QUESTIONS:
You can see in the second column only SNP ID is given, But in the VCF file there many SNPs that does not have IDs, How can i output variants with their respective chromosomal coordinate positions, so i can filter them out easily?
And how to use --hwe option in its correct combinations??
I really appreciate the answers..
Thanks!
--sohail