Plink 1.9 --het command issue

733 views
Skip to first unread message

Saija Tenhunen

unread,
Feb 20, 2018, 9:44:34 AM2/20/18
to plink2-users
Hi,

I have found following issue when using --het command. I have noticed this command filters out very much of the SNPs when performing the calculation. But this doesn't happen if I include the QC filtering commands in the command line with the --het command. This last option also includes more SNPs in the final calculation. I'm used to first do QC filtered binary files before doing other calculations because in my understanding too many commands in one line increases possible errors.
Here is the command lines and what Plink 1.9 gives out. This is horse data. 

This is  without QC filtering in Plink, average F was 0.247
plink --bfile FjordAll --chr-set 32 --het
 
15942 MB RAM detected; reserving 7971 MB for main workspace.
505601 variants loaded from .bim file.
423 samples (202 males, 221 females) loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 1 founder and 422 nonfounders present.
Calculating allele frequencies... done.
Warning: 632704 het. haploid genotypes present (see plink.hh ); many commands
treat these as missing.
Warning: Nonmissing nonmale Y chromosome genotype(s) present; many commands
treat these as missing.
Total genotyping rate is 0.997282.
505601 variants and 423 samples pass filters and QC.
Note: No phenotypes present.
--het: 131118 variants scanned, report written to plink.het .
 
This is with QC filtered data, average F is around 0.25
plink --bfile fjordhest-clear --chr-set 32 --het
 
15942 MB RAM detected; reserving 7971 MB for main workspace.
486676 variants loaded from .bim file.
423 samples (202 males, 221 females) loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 1 founder and 422 nonfounders present.
Calculating allele frequencies... done.
Warning: 619609 het. haploid genotypes present (see plink.hh ); many commands
treat these as missing.
Warning: Nonmissing nonmale Y chromosome genotype(s) present; many commands
treat these as missing.
Total genotyping rate is 0.997365.
486676 variants and 423 samples pass filters and QC.
Note: No phenotypes present.
--het: 130154 variants scanned, report written to plink.het .
 
And then with cleaning in the same command line. Average F somewhere around 0.003. This option also have lots of values which are negative. 
plink --bfile FjordAll --chr-set 32 --allow-no-sex --nonfounders --geno 0.05 --mind 0.05 --hwe 0.0001 --het
 
15942 MB RAM detected; reserving 7971 MB for main workspace.
505601 variants loaded from .bim file.
423 samples (202 males, 221 females) loaded from .fam.
0 samples removed due to missing genotype data (--mind).
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 1 founder and 422 nonfounders present.
Calculating allele frequencies... done.
Warning: 632704 het. haploid genotypes present (see plink.hh ); many commands
treat these as missing.
Warning: Nonmissing nonmale Y chromosome genotype(s) present; many commands
treat these as missing.
Total genotyping rate is 0.997282.
1 variant removed due to missing genotype data (--geno).
Warning: --hwe observation counts vary by more than 10%, due to the X
chromosome.  You may want to use a less stringent --hwe p-value threshold for X
chromosome variants.
--hwe: 18924 variants removed due to Hardy-Weinberg exact test.
486676 variants and 423 samples pass filters and QC.
Note: No phenotypes present.
--het: 472691 variants scanned, report written to plink.het .

So the steps should be same but results are very different. If there is any tips on understanding why this happens and which way might be correct to proceed. Last option looks most reasonable with the variant numbers but it is weird that results are so different. 

Thank you for your help! 

With best regards
Saija Tenhunen


Christopher Chang

unread,
Feb 20, 2018, 10:44:56 AM2/20/18
to plink2-users
Unless you specify --nonfounders, minor allele frequency is estimated from only founders.  Since you only have 1 founder in your dataset, there are lots of variants with minor allele frequency zero... and --het skips those.

Saija Tenhunen

unread,
Feb 20, 2018, 5:01:35 PM2/20/18
to plink2-users
Thank you! This explained the issue I was having :) 

-Saija- 
Reply all
Reply to author
Forward
0 new messages