PLINK 2.0 outputs MAFs > 0.5 using --freq?

931 views
Skip to first unread message

Hasnat A

unread,
Jul 25, 2018, 9:51:43 AM7/25/18
to plink2-users
Hi,

I calculated MAFs using PLINK 1.90 and PLINK 2.0 using the same input file and the same filters. I have copied and pasted the .log files and the output files below:

PLINK 1.90

-bash-4.2$ cat KL_gen_imp/snp_QC/called.25.1.9.log 
PLINK v1.90p 64-bit (16 Apr 2016)
Options in effect:
  --bfile /groupvol/med-bio/******/****/genotypes/***_cal_chr13_v2
  --chr 13
  --freq
  --from-bp 33590571
  --out /groupvol/med-bio/******/****/KL_gen_imp/snp_QC/called.25.1.9
  --to-bp 33640282

Hostname: login-3-internal
Working directory: /groupvol/med-bio/******/****
Start time: Wed Jul 25 14:10:41 2018

Random number seed: 1532524241
193488 MB RAM detected; reserving 96744 MB for main workspace.
13 variants loaded from .bim file.
488377 people (223506 males, 264857 females, 14 ambiguous) loaded from .fam.
Ambiguous sex IDs written to
/groupvol/med-bio/******/****/KL_gen_imp/snp_QC/called.25.1.9.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 488377 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.995749.
--freq: Allele frequencies (founders only) written to
/groupvol/med-bio/******/****/KL_gen_imp/snp_QC/called.25.1.9.frq .

End time: Wed Jul 25 14:10:41 2018
-bash-4.2$ cat KL_gen_imp/snp_QC/called.25.1.9.frq 
 CHR             SNP   A1   A2          MAF  NCHROBS
  13       rs9315201    T    G      0.02868   976052
  13        rs385564    G    C        0.324   947324
  13        rs526906    A    G       0.1597   975754
  13        rs537313    G    A       0.3844   973512
  13        rs577912    T    G       0.1522   974090
  13     rs118136643    T    G      0.01641   969912
  13        rs554634    C    T       0.3086   974458
  13      rs17643609    T    C      0.01326   975728
  13       rs9536314    G    T       0.1602   975346
  13       rs9527025    C    G       0.1603   974732
  13     rs141741908    C    T    0.0005481   976062
  13   Affx-89011680   TG    T    4.097e-06   976244
  13     rs146235320    A    G     0.001433   974604

PLINK 2.0

-bash-4.2$ cat KL_gen_imp/snp_QC/called.25.2.0.log 
PLINK v2.00a1LM 64-bit Intel (11 Feb 2018)
Options in effect:
  --bfile /groupvol/med-bio/******/****/genotypes/***_cal_chr13_v2
  --chr 13
  --freq
  --from-bp 33590571
  --out /groupvol/med-bio/******/****/KL_gen_imp/snp_QC/called.25.2.0
  --to-bp 33640282

Hostname: login-3-internal
Working directory: /groupvol/med-bio/******/****/scans/***_KL_1/continuous
Start time: Wed Jul 25 14:04:49 2018

Random number seed: 1532523889
193488 MB RAM detected; reserving 96744 MB for main workspace.
Using up to 24 threads (change this with --threads).
488377 samples (264857 females, 223506 males, 14 ambiguous; 488377 founders)
loaded from /groupvol/med-bio/******/****/genotypes/***_cal_chr13_v2.fam.
26806 variants loaded from
/groupvol/med-bio/******/****/genotypes/***_cal_chr13_v2.bim.
1 categorical phenotype loaded (488377 values).
Calculating allele frequencies... done.
--freq: Allele frequencies (founders only) written to
/groupvol/med-bio/******/****/KL_gen_imp/snp_QC/called.25.2.0.afreq .

End time: Wed Jul 25 14:04:49 2018
-bash-4.2$ cat KL_gen_imp/snp_QC/called.25.2.0.afreq 
#CHROM  ID      REF     ALT     ALT_FREQS       OBS_CT
13      rs9315201       T       G       0.971316        976052
13      rs385564        G       C       0.676037        947324
13      rs526906        G       A       0.159667        975754
13      rs537313        G       A       0.615646        973512
13      rs577912        G       T       0.15217 974090
13      rs118136643     T       G       0.983585        969912
13      rs554634        T       C       0.308568        974458
13      rs17643609      T       C       0.986738        975728
13      rs9536314       G       T       0.839793        975346
13      rs9527025       C       G       0.8397  974732
13      rs141741908     C       T       0.999452        976062
13      Affx-89011680   TG      T       0.999996        976244
13      rs146235320     A       G       0.998567        974604

The MAFs calculated by PLINK 2.0 don't make sense because some of them are greater than 0.5; this problem does not occur when PLINK 1.90 is used. I re-ran the command using the latest release, but the problem still occurred.

PLINK 2.0 19.07.2018

-bash-4.2$ cat KL_gen_imp/snp_QC/called.25.2.0.log 
PLINK v2.00a2LM 64-bit Intel (19 Jul 2018)
Options in effect:
  --bfile /groupvol/med-bio/******/****/genotypes/***_cal_chr13_v2
  --chr 13
  --freq
  --from-bp 33590571
  --out /groupvol/med-bio/******/****/KL_gen_imp/snp_QC/called.25.2.0
  --to-bp 33640282

Hostname: login-3-internal
Working directory: /groupvol/med-bio/******
Start time: Wed Jul 25 14:39:19 2018

Random number seed: 1532525959
193488 MiB RAM detected; reserving 96744 MiB for main workspace.
Using up to 24 threads (change this with --threads).
488377 samples (264857 females, 223506 males, 14 ambiguous; 488377 founders)
loaded from /groupvol/med-bio/******/****/genotypes/***_cal_chr13_v2.fam.
26806 variants loaded from
/groupvol/med-bio/******/****/genotypes/***_cal_chr13_v2.bim.
1 categorical phenotype loaded (488377 values).
Calculating allele frequencies... done.
--freq: Allele frequencies (founders only) written to
/groupvol/med-bio/******/****/KL_gen_imp/snp_QC/called.25.2.0.afreq .

End time: Wed Jul 25 14:39:19 2018
-bash-4.2$ cat KL_gen_imp/snp_QC/called.25.2.0.afreq 
#CHROM  ID      REF     ALT     ALT_FREQS       OBS_CT
13      rs9315201       T       G       0.971316        976052
13      rs385564        G       C       0.676037        947324
13      rs526906        G       A       0.159667        975754
13      rs537313        G       A       0.615646        973512
13      rs577912        G       T       0.15217 974090
13      rs118136643     T       G       0.983585        969912
13      rs554634        T       C       0.308568        974458
13      rs17643609      T       C       0.986738        975728
13      rs9536314       G       T       0.839793        975346
13      rs9527025       C       G       0.8397  974732
13      rs141741908     C       T       0.999452        976062
13      Affx-89011680   TG      T       0.999996        976244
13      rs146235320     A       G       0.998567        974604

Does anybody know why PLINK 2.0 is outputting MAFs > 0.5? 

Many thanks,

Hasnat

Christopher Chang

unread,
Jul 25, 2018, 10:16:27 AM7/25/18
to plink2-users
Note the difference in column headers. PLINK 2.0 —freq is reporting alternate allele frequencies. Alternate alleles are usually but not always minor.

(—maf and similar flags are still based on major/minor alleles, of course.)

Hasnat A

unread,
Jul 25, 2018, 10:59:04 AM7/25/18
to plink2-users
Thank you for your reply. From what I understand, --freq in PLINK 2.0 no longer returns MAFs, but rather returns the alternate allele frequency.

However, how does PLINK 2.0 decide which allele is alternate? And when PLINK 2.0 --glm is used, does the test refer to alternate alleles (i.e. reference allele homozygotes and alternate allele homozygotes are coded as 0 and 2, respectively) or minor alleles (i.e. major allele homozygotes and minor allele homozygotes are coded as 0 and 2, respectively)?

Christopher Chang

unread,
Jul 25, 2018, 11:44:05 AM7/25/18
to plink2-users
1. If you use PLINK 2.0 to import a VCF file, the alternate allele assignments are kept from there.  If you import a BGEN file, the first allele is assumed to be alternate and the last allele is assumed to be reference (this can be controlled with the 'ref-first'/'ref-last' modifiers).  If you're using a --bfile generated by PLINK 1.x, the alleles in the 5th column are treated as alternate; those will almost always be minor alleles, since PLINK 1.x swaps the 5th/6th columns as necessary to make 5th=minor unless you explicitly tell it not to do so.  (Warning: this means that, even if you use PLINK 2.0 to import a VCF, if you then operate on the imported data with PLINK 1.x, you'll destroy the alternate allele information if you aren't careful.)

2. Since February 2018, PLINK 2.0 --glm has been based on minor alleles, though you can force it to be based on alternate alleles instead by adding the 'omit-ref' modifier.  The output file normally has "REF", "ALT", and "A1" allele columns; the regression is based on A1.

Hasnat A

unread,
Jul 30, 2018, 8:08:40 AM7/30/18
to plink2-users
Thank you.

Lino Ferreira

unread,
Jun 25, 2021, 9:53:05 AM6/25/21
to plink2-users
If both alleles have frequency exactly equal to 0.5, does PLINK 2.0 use the REF or the ALT allele as the effect allele (A1)?

Thank you!

Christopher Chang

unread,
Jun 25, 2021, 12:10:36 PM6/25/21
to plink2-users
When --bfile input is provided to PLINK 2.0, A1 is interpreted as ALT and A2 is interpreted as REF, regardless of frequency.

When *generating* such filesets, PLINK 1.x defaults to swapping the alleles whenever A1's frequency is above (not equal to) 0.5.

Lino Ferreira

unread,
Jun 25, 2021, 12:16:47 PM6/25/21
to plink2-users
I had in mind the --glm function when a PGEN input is provided to PLINK 2.0. I did a simple test and it seems that the ALT allele is used as A1 when the frequency is equal to 0.5, which seems in line with the PLINK 1.x behaviour you mention.

Thanks for the quick reply!

Christopher Chang

unread,
Jun 25, 2021, 12:22:00 PM6/25/21
to plink2-users
Oh, sorry, didn't realize you were talking about --glm; yes, A1=ALT there when both allele frequencies are 0.5.
Reply all
Reply to author
Forward
0 new messages