What is the A1 and A2 in the GWAS output?

1,328 views
Skip to first unread message

Avni Kaur

unread,
Apr 12, 2024, 7:17:25 AM4/12/24
to plink2-users
Dear Plink users,

I have a very quick question regarding what is A1 and what is A2 for running a meta-analysis. 

If this is the output from my GWAS (see attached). Screenshot 2024-04-11 at 16.08.41.png
It's confusing to me because in some cases the A1 column matches REF and in others A1 matches ALT.

I need to specify A1 and A2, I have specified A1= REF and A2 = ALT, however, on the plink manual it is stated 'In Plink, A1 is usually the minor allele and A2 the major allele.  The allele matching the reference genome ("Ref") is more likely to be major/A2, but they won't always match.'  Also 'A1 is always the effect allele' 

In that case is A1 = A1 and A2 = REF rather than what I put initially. 

Thanks and best wishes

A

Matthew Maher

unread,
Apr 12, 2024, 10:18:38 AM4/12/24
to Avni Kaur, plink2-users
If you're doing a META-analysis you need to establish for each input, which is the effect/tested allele (i.e. associated with positive BETA).  For most GWAS, A1 or Allele1 is the effect allele.  But be alert - e.g. for SAIGE output, Allele2 is the effect allele.  

You appear to be showing output from PLINK2 --glm.  A1 is the effect/tested allele. If you want the A1 to always equal ALT you need to specify omit-ref.   From the docs:

For biallelic variants, G normally contains a single column with minor allele dosages. To make it always contain ALT allele dosages instead, add the 'omit-ref' modifier. (Why isn't omit-ref the default? We'll get to that.)

I'm a bit unclear about that parenthetical forward reference at the end - I'm not sure what subsequent discussion it refers to.   Can Christopher possibly clarify that?  



--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/71a03903-7e55-46bd-9e44-39397a65ee3bn%40googlegroups.com.

Christopher Chang

unread,
Apr 12, 2024, 11:56:33 PM4/12/24
to plink2-users
The parenthetical refers to the subsequent discussion of the 'interaction' modifier.

  • The 'interaction' modifier adds genotype x covariate interaction terms to G. More precisely, the additional columns are entrywise (Hadamard) products between a genotype/dosage column and a (non-intercept) covariate column.
    • When G contains a major allele with >90% frequency, the interaction terms can be very highly correlated with the genotype column. This is likely to cause the multicollinearity check to fail, and it isn't a situation where overriding the multicollinearity-check defaults is wise—numerical stability problems are likely.
      So you probably don't want to use 'omit-ref' when performing interaction testing. (And this is why omit-ref is no longer --glm's default setting; it was, back in 2017, until the ~5th time this specific problem came up...)

Matthew Maher

unread,
Apr 13, 2024, 11:52:12 AM4/13/24
to Christopher Chang, plink2-users
Thank you for that clarification.  Not sure how I missed that.

Avni Kaur

unread,
Apr 13, 2024, 12:40:06 PM4/13/24
to plink2-users
Thanks for your response. Got it Effect Allele is A1,  but what is A2 in this case because A1 is sometimes = to REF and sometimes A1 is = to ALT. 

Matthew Maher

unread,
Apr 16, 2024, 7:42:16 AM4/16/24
to Avni Kaur, plink2-users
In the context of a biallelic association test, it's just called the 'non-effect' or 'other' allele.

Don't confuse the two 'effect/other' labels with the two 'major/minor' labels or the two 'REF/ALT' labels.   Those are three completely different concepts.  When PLINK2 needs to select an allele to be the 'effect' allele, it defaults to choosing the 'minor'.  But if you include 'omit-ref', it will instead choose the 'ALT'

Avni Kaur

unread,
Apr 17, 2024, 5:03:18 AM4/17/24
to plink2-users
Thanks for your response. So can I just leave A1 = REF and A2 = ALT.

Avni Kaur

unread,
Apr 17, 2024, 12:30:38 PM4/17/24
to plink2-users
OMG super sorry for the confusion earlier. I understand now. Let me clarify for future PLINK users who may encounter the same problem with the meta-analysis function:

A1 represents the allele under test or the effect allele. However, A1 might occasionally match to either the reference (REF) or alternative (ALT) allele. For the PLINK meta-analysis function, you need to identify the non-effect allele (A2) alongside A1.

If I set A2 to REF or ALT - this could lead to mismatches or errors because A1 in my dataset might match either REF or ALT . To address this, I utilised the --glm cols=+ax tag during my GWAS rerun. This generated a column labeled AX in the output file, which contains the non-effect alleles.

Now, when conducting meta-analysis using PLINK, I designate the A1 allele using the --meta-analysis-a1-field parameter, which always remains as A1. In other words A1 is always the effect/test allele. For the A2 allele, I use the AX column. The AX will always be the other allele in comparison to A1.


Roy

unread,
Mar 28, 2025, 5:54:38 PMMar 28
to plink2-users

Hi,

Instead of rerunning my GWAS with the --glm cols=+ax option to generate the AX column (which provides the non-effect allele), can I infer A2 as the counterpart of A1? In my current GWAS output, A1 is the effect allele and I have REF and ALT columns available. So, if for a SNP, A1 equals REF, I would assign A2 as ALT, and if A1 equals ALT, then A2 would be REF.

Is this approach valid, or would you recommend rerunning with the --glm cols=+ax flag?

Thanks.

Chris Chang

unread,
Mar 28, 2025, 5:55:42 PMMar 28
to Roy, plink2-users
For biallelic variants, yes, that always works.

Roy

unread,
Apr 3, 2025, 8:17:38 PMApr 3
to Chris Chang, plink2-users
Hi, 

Thanks for your response. I have another question: 

Could you please answer whether running PLINK2 with the --glm cols=+ax flag to obtain a non-effect A2 column can fix the SNP ambiguity problem inherent in ambiguous SNPs (A/T, C/G) or not? My understanding is that this option outputs a complementary allele based on the REF and ALT columns without addressing the underlying strand alignment issues for ambiguous SNPs.

Thanks.

Christopher Chang

unread,
Apr 3, 2025, 11:45:34 PMApr 3
to plink2-users
--glm cols=+ax doesn't provide any information that isn't already present in the REF/ALT columns.  (Modern genomic data processing pipelines should not have A/T or C/G SNP ambiguity problems, though.)
Reply all
Reply to author
Forward
0 new messages