MAF Filter

359 views
Skip to first unread message

Sean Gallagher

unread,
Mar 25, 2019, 11:25:17 AM3/25/19
to plink2-users
Hello I am trying to further understand how the maf filter when running --make-bed works. Namely things like how is it calculated and how does it handle missing values. The following is the output of the commands I ran, I was curious as to what filtering is happening, I should have ~72k markers yet only ~68k are loaded from the bim.

Any insight on how these filters work would be greatly appreciated!

PLINK v1.90b6.7 64-bit (2 Dec 2018)            www.cog-genomics.org/plink/1.9/
(C) 2005-2018 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /tmp/tmp7f3caq81/scrap/plink_bin/plink_0.05.log.
Options in effect:
  --allow-no-sex
  --double-id
  --file /tmp/tmp7f3caq81/scrap/plink/plink
  --horse
  --maf 0.05
  --make-bed
  --not-chr X
  --out /tmp/tmp7f3caq81/scrap/plink_bin/plink_0.05

15958 MB RAM detected; reserving 7979 MB for main workspace.
Allocated 5984 MB successfully, after larger attempt(s) failed.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (68160 variants, 190 horses).
--file: /tmp/tmp7f3caq81/scrap/plink_bin/plink_0.05-temporary.bed +
/tmp/tmp7f3caq81/scrap/plink_bin/plink_0.05-temporary.bim +
/tmp/tmp7f3caq81/scrap/plink_bin/plink_0.05-temporary.fam written.
68160 variants loaded from .bim file.
190 horses (0 males, 0 females, 190 ambiguous) loaded from .fam.
Ambiguous sex IDs written to /tmp/tmp7f3caq81/scrap/plink_bin/plink_0.05.nosex
.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 190 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.980049.
12573 variants removed due to minor allele threshold(s)
(--maf/--max-maf/--mac/--max-mac).
55587 variants and 190 horses pass filters and QC.
Note: No phenotypes present.
--make-bed to /tmp/tmp7f3caq81/scrap/plink_bin/plink_0.05.bed +
/tmp/tmp7f3caq81/scrap/plink_bin/plink_0.05.bim +
/tmp/tmp7f3caq81/scrap/plink_bin/plink_0.05.fam ... done.
PLINK v1.90b6.7 64-bit (2 Dec 2018)            www.cog-genomics.org/plink/1.9/
(C) 2005-2018 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /tmp/tmp7f3caq81/scrap/vcfs/plink_0.05.log.
Options in effect:
  --allow-no-sex
  --bfile /tmp/tmp7f3caq81/scrap/plink_bin/plink_0.05
  --horse
  --not-chr X
  --out /tmp/tmp7f3caq81/scrap/vcfs/plink_0.05
  --recode vcf

15958 MB RAM detected; reserving 7979 MB for main workspace.
Allocated 5984 MB successfully, after larger attempt(s) failed.
55587 variants loaded from .bim file.
190 horses (0 males, 0 females, 190 ambiguous) loaded from .fam.
Ambiguous sex IDs written to /tmp/tmp7f3caq81/scrap/vcfs/plink_0.05.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 190 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.976796.
55587 variants and 190 horses pass filters and QC.
Note: No phenotypes present.

Christopher Chang

unread,
Mar 25, 2019, 12:10:46 PM3/25/19
to plink2-users
Can you explain exactly why you "should have ~72k markers"?  This log indicates that you started with 68160; this is before the MAF filter even comes up.

MAF is computed in the obvious way.  If there are 100 homozygous-major genotypes, 50 heterozygous genotypes, 10 homozygous-minor, and 30 missing, the MAF is (10 * 2 + 50) / (160 * 2) = 70 / 320 = 0.21875.

Sean Gallagher

unread,
Mar 25, 2019, 12:18:39 PM3/25/19
to plink2-users
Okay thank you for your quick reply, I think that I should have ~72k markers because that is how many markers are in the ped file, it is possible there is a bug in my generation of the .map file but I didn't think so. Does the --make-bed command do any inherit filtering?

Christopher Chang

unread,
Mar 25, 2019, 12:25:38 PM3/25/19
to plink2-users
Not unless you specify a chromosome filter.  Can you please explain why you think there are ~72k markers in the .ped and .map files, with a "wc -l" result on the .map file or similar command-line/log output?

Sean Gallagher

unread,
Mar 25, 2019, 12:30:20 PM3/25/19
to plink2-users
The wc -l on the map is 71890, but I believe you just solved it I had forgot I was using the --horse and --not-chr X filters so those must be the markers that get filtered out prior. Thanks so much for your help I apologize for the confusion!

Christopher Chang

unread,
Mar 25, 2019, 12:56:29 PM3/25/19
to plink2-users
No problem; my fault for not noticing the "--not-chr X" in your initial command line!
Reply all
Reply to author
Forward
0 new messages