Sorry for that. Here's the detailed information.
Im working with UKB British White cohort with imputed genotypes, the dataset contains ~300K samples and over 60M variants. The imputation is originally carried out for all 500K UKB individuals, so loci without polymorphism emerged after "keep" the unrelated British White. Here's the log of extract individuals and filteration:
###############################################
PLINK v2.00a3.1LM 64-bit Intel (19 May 2022)
Options in effect:
--keep sample_278781.list
--maf 5e-6
--make-bed
--out uBritish_Impute
--pfile Human_UKB_Impute ## note here we started from pfiles.
Hostname: server200
Working directory: /data5/
Start time: Fri Aug 5 21:00:45 2022
Random number seed: 1659704445
773821 MiB RAM detected; reserving 386910 MiB for main workspace.
Using up to 80 threads (change this with --threads).
487409 samples (264296 females, 222987 males, 126 ambiguous; 487409 founders)
loaded from Human_UKB_Impute.psam.
88620726 variants loaded from Human_UKB_Impute.pvar.
Note: No phenotype data present.
--keep: 278781 samples remaining.
278781 samples (149200 females, 129581 males; 278781 founders) remaining after
main filters.
Calculating allele frequencies... done.
23770186 variants removed due to allele frequency threshold(s)
(--maf/--max-maf/--mac/--max-mac).
64850540 variants remaining after main filters.
Writing uBritish_Impute.fam ... done.
Writing uBritish_Impute.bim ... done.
Writing uBritish_Impute.bed ... done.
###############################################
Then I chenked out the minor allele frequency by plink1.9, the command is
###############################################
plink --bfile uBritish_Impute --freq --out uBritish_Impute
###############################################
Finally I looked into the output uBritish_Impute.frq, I found there are still numerous variants with low or no polymorphism remains.
###############################################
CHR SNP A1 A2 MAF NCHROBS
1 rs534229142 A G 0.0004393 555432
1 rs537182016 A C 3.77e-05 557032
1 rs558604819 A G 6.997e-05 557404
1 rs561109771 G T 1.076e-05 557562
1 rs574746232 G T 3.587e-06 557534
1 rs552314247 C G 1.256e-05 557518
1 rs562993331 A G 0.000131 557402
1 rs548333521 A G 0 557558
1 rs568318295 T C 4.663e-05 557528
###############################################
And a total of 10,359,651 variants without polymorphism still exists.
###############################################
> awk 'IF $5==0{print $0}' uBritish_Impute.frq | head
1 rs548333521 A G 0 557558
1 rs548087592 T C 0 557510
1 rs540466151 G T 0 557562
1 rs561913721 A G 0 557500
1 rs552113149 C G 0 557510
1 rs556492625 T G 0 557562
1 rs550231014 C T 0 557562
1 rs556025965 T C 0 557392
1 rs575961614 A G 0 557488
1 rs563966321 A G 0 557498
>
> awk 'IF $5==0{print $0}' uBritish_Impute.frq | wc -l
10359651
###############################################
An extra test.
I extracted the first 10 variants shown just now, which we knew the rs574746232 and rs548333521 should be removed, then did the filter.
###############################################
PLINK v2.00a3.1LM 64-bit Intel (19 May 2022)
www.cog-genomics.org/plink/2.0/(C) 2005-2022 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to test.log.
Options in effect:
--bfile uBritish.10snp
--maf 5e-6
--make-bed
--out test
Start time: Tue Sep 13 09:27:32 2022
773821 MiB RAM detected; reserving 386910 MiB for main workspace.
Allocated 290182 MiB successfully, after larger attempt(s) failed.
Using up to 80 threads (change this with --threads).
278781 samples (149200 females, 129581 males; 278781 founders) loaded from
uBritish.10snp.fam.
10 variants loaded from uBritish.10snp.bim.
Note: No phenotype data present.
Calculating allele frequencies... done.
2 variants removed due to allele frequency threshold(s)(--maf/--max-maf/--mac/--max-mac).
8 variants remaining after main filters.
Writing test.fam ... done.
Writing test.bim ... done.
Writing test.bed ... done.
End time: Tue Sep 13 09:27:32 2022
###############################################
It works fine, weird. I suspect maybe it has something to do with pfile conversion? or maybe it's a bug in plink2's handling of tens of millions of variants.
Please check it out, thanks.