Multi-allicity

27 views
Skip to first unread message

Michael Bloemendal

unread,
May 26, 2026, 1:28:18 AM (11 days ago) May 26
to plink2-users
I try to figure out, what PLINK really does with multi-allelic SNPs.
Let's assume a tri-allelic SNP with A as reference (80%)and G (15%) and T (5%) as minor alleles.
As far as I understand, PLINK has two options, either only the first minor allele is taken into account, or the SNP is split in pseudo-biallelic SNPs, one for G and one for T in this case.

In both cases, I can't find what really happens. 
In the first case, are the T's excluded or dealt with as A's.
In the second case, are ATs excluded from the G-pseudo biallelic fit, amd what happens with GT individuals

Chris Chang

unread,
May 26, 2026, 1:30:09 AM (11 days ago) May 26
to Michael Bloemendal, plink2-users
Please use PLINK 2.0 when working with multi-allelic SNPs.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/plink2-users/373c7cba-7ea9-4a3b-8139-305e31e61131n%40googlegroups.com.

Michael Bloemendal

unread,
May 26, 2026, 1:37:17 AM (11 days ago) May 26
to Chris Chang, plink2-users
Thanks for fast respons.
I use PLINK 2.0, but my question is what PLINK exactly does with the multi-alleles (see my question below)
To explain my question a bit more. In the tri-allelic SNP I suggested (80%A, 15%G and 5%T), there will be individuals with AA, AG, AT, GT, GG and TT.
1.What happens when you only use the second allele with all individuals that have a T. Are the T's coded as 0, meaning they are equalized to A.
2. What happens with the T's in the pseudo-biallelic split for G and vice versa with G in the spseudo T biallic SNP
  

Op di 26 mei 2026 om 08:30 schreef Chris Chang <chrch...@gmail.com>:

Chris Chang

unread,
May 26, 2026, 3:47:09 AM (11 days ago) May 26
to Michael Bloemendal, plink2-users
PLINK 2.0 can directly handle multiallelic variants, without e.g. a pseudo-biallelic split.  To the extent your question makes any sense, it relates to how the data was preprocessed, and does not seem to have anything to do with PLINK 2.0.

Michael Bloemendal

unread,
May 26, 2026, 4:04:00 AM (11 days ago) May 26
to Chris Chang, plink2-users
Sorry for bothering you again, but this raises a few questions by me.

1, Was this different in Plink 1
2. Does this mean that in fact Plink 2.0 does not do the GWAS with univariate regressions anymore, but with bi- or trivariate? That would be a deviation from what is described in most of the literature.


Op di 26 mei 2026 om 10:46 schreef Chris Chang <chrch...@gmail.com>:

Chris Chang

unread,
May 26, 2026, 4:17:35 AM (11 days ago) May 26
to Michael Bloemendal, plink2-users
1. Yes, Plink 1.x only supported biallelic variants.

2. Yes, from the Plink 2.0 --glm documentation:
"for multiallelic variants, G normally contains one column for each nonmajor2 allele. 'omit-ref' changes this to one column for each ALT allele.
If some but not all of these allele columns are constant, the constant columns are omitted. (Before 20 Mar 2020, the entire variant was skipped in this case.)"

This behavior has been stable for >6 years.

Michael Bloemendal

unread,
May 26, 2026, 8:06:45 AM (11 days ago) May 26
to Chris Chang, plink2-users
Just to be sure. Does that mean that in fact the following equation is used:    

image.png
And an additional question: Do you have (apart from the documentation) a reference on the changes you made from PLINK 1.9 to 2.0, and why you made them?


Op di 26 mei 2026 om 11:17 schreef Chris Chang <chrch...@gmail.com>:

Chris Chang

unread,
May 26, 2026, 9:46:37 PM (10 days ago) May 26
to Michael Bloemendal, plink2-users
1. If every SNP had 4 alleles, then that equation would accurately describe the calculation.  (In practice, you probably have lots of biallelic variants, and some variants with 3 or more alleles.)
2. The second paragraph and the "What's new? section of https://www.cog-genomics.org/plink/2.0/ summarize the main changes and new features in Plink 2.0.  (Note that the multiallelic-variant regression we just discussed is a new feature rather than a change.  Plink 2.0's linear regression still handles biallelic variants in the same way as Plink 1; it's just that Plink 1 doesn't have a built-in notion of multiallelic variants at all.)

Michael Bloemendal

unread,
May 27, 2026, 1:34:51 AM (10 days ago) May 27
to Chris Chang, plink2-users
Great. Thank for giving me a better insight in PLINK

My next question is what PLINK 1 does with multiallelic variants. I saw in the literature that (i) some programs simply discard that whole SNP or the more rare variant at the SNP, whereas (ii) others split it in pseudo-biallelics.

If splitted in PLINK 1:
Let's assume a tri-allelic SNP with 80%A, 15%G and 5%T, meaning there will be individuals with AA, AG, AT, GT, GG and TT.
What happens with the T's in the pseudo-biallelic split for G? Are they counted as 0, meaning they in fact are dealt with as reference? Or are they discarded, meaning that the sample size decreases?

If the more rare variant is discarded in PLINK 1, a similar question holds, unless the whole SNP is discarded.  

Op wo 27 mei 2026 om 04:46 schreef Chris Chang <chrch...@gmail.com>:

Chris Chang

unread,
May 27, 2026, 2:14:38 AM (10 days ago) May 27
to Michael Bloemendal, plink2-users
Again, not a PLINK question.  Instead, it’s a question about the program/command performing the pseudo-biallelic split, which is likely to be  .
Reply all
Reply to author
Forward
0 new messages