Differences between --maf in plink1.9 and plink2

610 views
Skip to first unread message

2181...@zju.edu.cn

unread,
Sep 12, 2022, 4:43:17 AM9/12/22
to plink2-users
Dear all, 
  I am aware --maf in plink2 failed in the filteration of variants with minor allele frequency lower than the given value (not all, but many of alleles without polymorphism still remains). After read the manual, I found that "maf" in plink1.9 explained as "minor allele frequencies/counts" while "maf" in plink2 explained as "allele frequencies/counts". Is there any differences in using it in plink1.9 and plink2? Or maybe the failure in filteration in plink2 is a bug?

Christopher Chang

unread,
Sep 12, 2022, 2:35:33 PM9/12/22
to plink2-users
Sorry, but this post is missing too much information for me to even have any idea what you're asking.

2181...@zju.edu.cn

unread,
Sep 12, 2022, 9:39:45 PM9/12/22
to plink2-users
Sorry for that. Here's the detailed information.

Im working with UKB British White cohort with imputed genotypes, the dataset contains ~300K samples and over 60M variants. The imputation is originally carried out for all 500K UKB individuals, so loci without polymorphism emerged after  "keep" the unrelated British White. Here's the log of extract individuals and filteration:

###############################################
PLINK v2.00a3.1LM 64-bit Intel (19 May 2022)
Options in effect:
  --keep sample_278781.list
  --maf 5e-6
  --make-bed
  --out uBritish_Impute
  --pfile Human_UKB_Impute ## note here we started from pfiles.

Hostname: server200
Working directory: /data5/
Start time: Fri Aug  5 21:00:45 2022

Random number seed: 1659704445
773821 MiB RAM detected; reserving 386910 MiB for main workspace.
Using up to 80 threads (change this with --threads).
487409 samples (264296 females, 222987 males, 126 ambiguous; 487409 founders)
loaded from Human_UKB_Impute.psam.
88620726 variants loaded from Human_UKB_Impute.pvar.
Note: No phenotype data present.
--keep: 278781 samples remaining.
278781 samples (149200 females, 129581 males; 278781 founders) remaining after
main filters.
Calculating allele frequencies... done.
23770186 variants removed due to allele frequency threshold(s)
(--maf/--max-maf/--mac/--max-mac).
64850540 variants remaining after main filters.
Writing uBritish_Impute.fam ... done.
Writing uBritish_Impute.bim ... done.
Writing uBritish_Impute.bed ... done.
###############################################

Then I chenked out the minor allele frequency by plink1.9, the command is 

###############################################
plink --bfile uBritish_Impute --freq --out uBritish_Impute
###############################################

Finally I looked into the output uBritish_Impute.frq, I found there are still numerous variants with low or no polymorphism remains.

###############################################
 CHR              SNP   A1   A2          MAF  NCHROBS
   1      rs534229142    A    G    0.0004393   555432
   1      rs537182016    A    C     3.77e-05   557032
   1      rs558604819    A    G    6.997e-05   557404
   1      rs561109771    G    T    1.076e-05   557562
   1      rs574746232    G    T    3.587e-06   557534
   1      rs552314247    C    G    1.256e-05   557518
   1      rs562993331    A    G     0.000131   557402
   1      rs548333521    A    G            0   557558
   1      rs568318295    T    C    4.663e-05   557528
###############################################

And a total of 10,359,651 variants without polymorphism still exists.
###############################################
> awk 'IF $5==0{print $0}' uBritish_Impute.frq | head
  1      rs548333521    A    G            0   557558
   1      rs548087592    T    C            0   557510
   1      rs540466151    G    T            0   557562
   1      rs561913721    A    G            0   557500
   1      rs552113149    C    G            0   557510
   1      rs556492625    T    G            0   557562
   1      rs550231014    C    T            0   557562
   1      rs556025965    T    C            0   557392
   1      rs575961614    A    G            0   557488
   1      rs563966321    A    G            0   557498

> awk 'IF $5==0{print $0}' uBritish_Impute.frq | wc -l
10359651
###############################################

An extra test. 
I extracted the first 10 variants shown just now, which we knew the rs574746232 and rs548333521 should be removed, then did the filter.
###############################################
PLINK v2.00a3.1LM 64-bit Intel (19 May 2022)   www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to test.log.
Options in effect:
  --bfile uBritish.10snp
  --maf 5e-6
  --make-bed
  --out test

Start time: Tue Sep 13 09:27:32 2022
773821 MiB RAM detected; reserving 386910 MiB for main workspace.
Allocated 290182 MiB successfully, after larger attempt(s) failed.
Using up to 80 threads (change this with --threads).
278781 samples (149200 females, 129581 males; 278781 founders) loaded from
uBritish.10snp.fam.
10 variants loaded from uBritish.10snp.bim.
Note: No phenotype data present.
Calculating allele frequencies... done.
2 variants removed due to allele frequency threshold(s)
(--maf/--max-maf/--mac/--max-mac).
8 variants remaining after main filters.
Writing test.fam ... done.
Writing test.bim ... done.
Writing test.bed ... done.
End time: Tue Sep 13 09:27:32 2022
###############################################
It works fine, weird. I suspect maybe it has something to do with pfile conversion? or maybe it's a bug in plink2's handling of tens of millions of variants.

Please check it out, thanks.

Chris Chang

unread,
Sep 12, 2022, 9:59:42 PM9/12/22
to 2181...@zju.edu.cn, plink2-users
If the pfile contains dosages, that would explain what you’re seeing.  plink2 uses dosages when computing allele frequencies, and those dosages are lost during —make-bed.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/01c2fd7e-ec44-4417-8eb9-595582f5a73en%40googlegroups.com.

2181...@zju.edu.cn

unread,
Sep 13, 2022, 3:57:15 AM9/13/22
to plink2-users
Thanks, that explains. 

I think both dosage mode and genotype mode are necessary, many down-stream analysis still based on the genotype information or accept plink1 binary fileset only. For me, as a regular user of plink1.9, the transition of using plink2 --maf is not very smooth. If people use plink2 to filter alleles for their allele frequencies, applied the bed/bim/fam output with many non-polymorphism variants to other software, the results would be misleading.

To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users+unsubscribe@googlegroups.com.

Chris Chang

unread,
Sep 13, 2022, 10:05:18 AM9/13/22
to 2181...@zju.edu.cn, plink2-users
It is straightforward to run “—make-pgen erase-dosage” to create a genotype-only .pgen, whenever that matters.

To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/bb6838f0-50c6-472e-8df4-5bad3bb1911en%40googlegroups.com.

2181...@zju.edu.cn

unread,
Sep 13, 2022, 8:41:02 PM9/13/22
to plink2-users

That works. Thank you very much for your help!

Maulik Patel

unread,
Oct 3, 2022, 1:20:00 PM10/3/22
to plink2-users
Yes I have facing the same issue in Plink 1.9 and Plink2 comparison.  When I used Plink 2 with bed files and Plink 2 with Pfiles gives different results in each filtering steps like HWE,maf and GENO.

Christopher Chang

unread,
Oct 3, 2022, 2:20:38 PM10/3/22
to plink2-users
Please read the rest of this thread, and provide a very clear explanation (including full .log output) of why it doesn't already answer your question.
Reply all
Reply to author
Forward
0 new messages