Effect allele vs minor allele vs non-reference allele in PheWeb

162 views
Skip to first unread message

James Pirruccello

unread,
Mar 9, 2021, 9:34:59 AM3/9/21
to PheWeb-UMich
Hoping to raise an issue that will either show my misunderstanding of PheWeb or might point to a way in which it could be improved by ensuring that all statements on a PheWeb variant page are with regard to the same allele.

I am looking at a specific example: https://pheweb.org/UKB-TOPMed/variant/3:12799435-T-C 

As noted on that page, "MAF ranges from 0.33 to 0.33". But I don't see a way to tell, from this page, which allele is the minor allele. Going to dbSNP, in African and European ancestries the minor allele is the reference T allele. So, the first point I would raise here is that I don't see how to determine which allele is the "MAF" without exiting PheWeb. It would be more intuitive to either (a) state which allele is the minor allele, or (b) always provide the allele frequency of the non-reference allele rather than the minor allele.

At the bottom of that page, disease associations are displayed. However, there is not an indication of which allele is the risk-increasing allele.

If we hover with the mouse over the arrows on the graph, we are shown case and control allele frequencies. For atrial fibrillation, the case allele frequency is shown to be 68% and the control allele frequency is shown to be 67%. There is not a statement about which allele is case or control, but from dbSNP I have now inferred that this means that the risk-increasing allele is the C allele (the non-reference and major allele).

Assuming I have interpreted all of this correctly, it seems that the disease association is indeed being reported with respect to the non-reference allele, and so I think my initial confusion stemmed from the reporting of the MAF without indicating which allele was minor. Explicitly stating that the non-reference allele is the effect allele would be helpful, and I would also suggest displaying the non-reference allele frequency rather than (or in addition to) the MAF, since that would ensure that everything is aligned and stated with regard to the non-reference allele.

Thanks for a great tool and for your consideration.

Best,

James

pjvh

unread,
Mar 10, 2021, 1:07:52 AM3/10/21
to PheWeb-UMich
Yeah, it sounds like you figured everything out correctly.    You're right that for a variant "T / C", AF refers to the alternate allele C, and that T is on the hg38 reference genome.

Yeah, that "MAF range" was always kind of bad.  I just improved it so that now it'll show AF range for datasets where every phenotype has AF.

Sometimes I get questions about reference allele vs alternate allele.  I tried to explain it clearly in the loading instructions, but it's not clear on the website.  I just added text on https://pheweb.org/UKB-TOPMed/about explaining that AF refers to the alternate allele, and I'll add tooltips to explain it too.  Please let me know if you have a better way to word that explanation.

I added a column on the variant page to show effect size.

"Case AF" and "Control AF" refer to the AF among cases and controls for this binary trait.  I changed the text to "AF among cases" to make that more clear.

Sorry that these things weren't very clear.  I need to add tooltips or something to provide more information.

Please let me know if these changes haven't fully addressed the problems, and let me know if you have more suggestions.

Peter
Reply all
Reply to author
Forward
0 new messages