This is without QC filtering in Plink, average F was 0.247
plink --bfile FjordAll --chr-set 32 --het
15942 MB RAM detected; reserving 7971 MB for main workspace.
505601 variants loaded from .bim file.
423 samples (202 males, 221 females) loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 1 founder and 422 nonfounders present.
Calculating allele frequencies... done.
Warning: 632704 het. haploid genotypes present (see plink.hh ); many commands
treat these as missing.
Warning: Nonmissing nonmale Y chromosome genotype(s) present; many commands
treat these as missing.
Total genotyping rate is 0.997282.
505601 variants and 423 samples pass filters and QC.
Note: No phenotypes present.
--het: 131118 variants scanned, report written to plink.het .
This is with QC filtered data, average F is around 0.25
plink --bfile fjordhest-clear --chr-set 32 --het
15942 MB RAM detected; reserving 7971 MB for main workspace.
486676 variants loaded from .bim file.
423 samples (202 males, 221 females) loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 1 founder and 422 nonfounders present.
Calculating allele frequencies... done.
Warning: 619609 het. haploid genotypes present (see plink.hh ); many commands
treat these as missing.
Warning: Nonmissing nonmale Y chromosome genotype(s) present; many commands
treat these as missing.
Total genotyping rate is 0.997365.
486676 variants and 423 samples pass filters and QC.
Note: No phenotypes present.
--het: 130154 variants scanned, report written to plink.het .
And then with cleaning in the same command line. Average F somewhere around 0.003. This option also have lots of values which are negative.
plink --bfile FjordAll --chr-set 32 --allow-no-sex --nonfounders --geno 0.05 --mind 0.05 --hwe 0.0001 --het
15942 MB RAM detected; reserving 7971 MB for main workspace.
505601 variants loaded from .bim file.
423 samples (202 males, 221 females) loaded from .fam.
0 samples removed due to missing genotype data (--mind).
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 1 founder and 422 nonfounders present.
Calculating allele frequencies... done.
Warning: 632704 het. haploid genotypes present (see plink.hh ); many commands
treat these as missing.
Warning: Nonmissing nonmale Y chromosome genotype(s) present; many commands
treat these as missing.
Total genotyping rate is 0.997282.
1 variant removed due to missing genotype data (--geno).
Warning: --hwe observation counts vary by more than 10%, due to the X
chromosome. You may want to use a less stringent --hwe p-value threshold for X
chromosome variants.
--hwe: 18924 variants removed due to Hardy-Weinberg exact test.
486676 variants and 423 samples pass filters and QC.
Note: No phenotypes present.
--het: 472691 variants scanned, report written to plink.het .
So the steps should be same but results are very different. If there is any tips on understanding why this happens and which way might be correct to proceed. Last option looks most reasonable with the variant numbers but it is weird that results are so different.
Thank you for your help!
With best regards
Saija Tenhunen