Problems in using Hardy weinberg Equilibrium test by plink-help needed

348 views
Skip to first unread message

sohail.l...@gmail.com

unread,
Oct 14, 2016, 3:47:41 AM10/14/16
to plink2-users
Hi,

I am new in using Plink, I am using Plink for Hardy-weinberg equilibrium test to filter-out variants with excessive heterozygous sites in my WGS data. I am starting analysis by chr-by-ch vcf files,

My sample command is 

./plink --vcf VQSR_PHASE2_snp99.5-interval-chr2.vcf --hardy midp --out sohail

It outputs something like this:

CHR           SNP     TEST   A1   A2                 GENO   O(HET)   E(HET)            P 
   2   rs117922525  ALL(NP)    C    T               0/1/25  0.03846  0.03772          0.5
   2    rs57072359  ALL(NP)    A    G               0/1/25  0.03846  0.03772          0.5
   2   rs147294326  ALL(NP)    A    G               0/1/25  0.03846  0.03772          0.5
   2    rs10205197  ALL(NP)    C    G               2/7/17   0.2692   0.3336       0.1686
   2             .  ALL(NP)    G    C               0/1/25  0.03846  0.03772          0.5

and when i tried using:

 ./plink --vcf VQSR_PHASE2_snp99.5-interval-chr2.vcf --hardy --hwe 0.05 midp --out sohail2

The variants with P-value < 0.05 were not filtered out. Instead their P-values were changed, something like that..
 CHR           SNP     TEST   A1   A2                 GENO   O(HET)   E(HET)            P 
   2   rs117922525  ALL(NP)    C    T               0/1/25  0.03846  0.03772            1
   2    rs57072359  ALL(NP)    A    G               0/1/25  0.03846  0.03772            1
   2   rs147294326  ALL(NP)    A    G               0/1/25  0.03846  0.03772            1
   2    rs10205197  ALL(NP)    C    G               2/7/17   0.2692   0.3336       0.2878
   2             .  ALL(NP)    G    C               0/1/25  0.03846  0.03772            1

QUESTIONS:
You can see in the second column only SNP ID is given, But in the VCF file there many SNPs that does not have IDs, How can i output variants with their respective chromosomal coordinate positions, so i can filter them out easily? 

And how to use --hwe option in its correct combinations??


I really appreciate the answers..

Thanks!

--sohail 



Christopher Chang

unread,
Oct 14, 2016, 12:21:36 PM10/14/16
to plink2-users
1. You need to use "--hardy midp --hwe 0.05 midp" to make both the --hardy report and the --hwe filter use midp-statistics; they are essentially independent of each other.  Sorry about the redundancy here.

2. If you're just dealing with SNPs, you can use --set-missing-var-ids (see https://www.cog-genomics.org/plink2/data#set_missing_var_ids ) to assign unique names to the unnamed ones.  (If you also have indels, you may need to use a more sophisticated procedure.)

3. Once you've named all your SNPs, you need to add e.g. --write-snplist to list the SNPs which survive the --hwe filter.  --hardy does not happen after --hwe; they essentially happen simultaneously, so you still see the (mid)p-values of the filtered-out SNPs in the --hardy report.

sohail.l...@gmail.com

unread,
Oct 16, 2016, 11:01:44 PM10/16/16
to plink2-users
Thanks Christopher.. It worked fine for me... :)
Reply all
Reply to author
Forward
0 new messages