Choosing effect allele for allelic scoring

Lee Stopak

unread,

Jan 9, 2019, 12:34:52 PM1/9/19

to plink2-users

Hi PLINK community,

I have a question about how the effect allele is chosen for allelic scoring. I will elaborate my question in the example below:

Say we have SNP1 with alleles C and T. The ORT = 3. The ORC = 1/3.

In this case we can approach it in two ways: T is the effect allele and C is the non-effect allele. This means that T increases risk for disease, and that if we compare to the homozygote of the non-effect allele, C, genotype CC = 1, genotype TC = 3, and genotype TT = 9. Therefore if we score using ln(OR), the three possible betas are: CC = ln(1), TC = ln(3), or TT = 2*ln(3).

In the other case we can approach it as follows: C is the effect allele and T is the non-effect allele. This means that C decreases risk for disease, and that if we compare to the homozygote of the non-effect allele, T, genotype TT = 1, genotype TC = 1/3, and genotype TT = 1/9. Therefore if we score using ln(OR), the three possible betas are TT = ln(1), TC = ln(1/3), or CC = 2*ln(1/3).

In these cases then, we will get the same total number, but with opposite signs for scoring depending on which allele we choose as the effect allele. Therefore, when scoring, if we are summing betas for each SNP, if we choose the effect allele to always increase risk, the number will be a very large positive number, and if we choose the effect allele to always decrease risk, the number will be a very large negative number.

So my question is, when scoring, how do we choose the effect and non-effect allele? Does it depend on the allele frequencies?

Thanks :)

Christopher Chang

unread,

Jan 9, 2019, 1:42:31 PM1/9/19

to plink2-users

Either choice works, as long as you interpret the results properly (i.e. know what the baseline score of 0 corresponds to).

Lee Stopak

unread,

Jan 10, 2019, 5:36:11 AM1/10/19

to plink2-users

Hi Christopher,

Thanks for the reply. I am aware that the choices are identical in meaning. Let me try to clarify my question with another example.

We have SNP1 and SNP2. Say we have SNP1 with alleles C and T. The ORT = 3. The ORC = 1/3. SNP2 has alleles A and G. The ORA = 3. The ORG = 1/3.

Now say we have two individuals. P1 with SNP1: TT and SNP2: AA and P2 with SNP1: CC and SNP2: GG. In this case, if we use OR > 1 as the effect allele, T is the effect allele for SNP1 and A is the effect allele for SNP2. P1 will have a PRS of 2*ln(3) + 2*ln(3), and P2 will have a PRS of 0. In the alternate case, if we use OR < 1 as the effect allele, C will be the effect allele for SNP1 and G is the effect allele for SNP2. P1 will have a PRS of 0, and P2 will have a PRS of -2*ln(3) - 2*ln(3).

In this case then, the PRS of the individuals are only meaningful when compared with each other, because the raw PRS score depends totally on which allele is chosen as the effect allele.

So if we want to apply a PRS on an individual level, is this only possible after analysis, such as performing a case/control logistic regression using PRS as the input?

Thanks :)

Lee Stopak

unread,

Jan 11, 2019, 3:54:40 AM1/11/19

to plink2-users

Hi all,

is the way the final score is calculated by normalizing to a relative scale? For example if we are calculating risks for many people, we assume a normal distribution, and therefore can normalize the mean to a value of 0, where all values above 0 would be increased risk, and all values below 0 decreased risk.

Lee

张丹丹

unread,

Jul 20, 2021, 5:19:48 AM7/20/21

to plink2-users

personally speaking, either using allele with OR > 1or OR < 1 consistently through all SNPs as effect allele does make no difference, as long as the direction be traced. What makes a difference is using MAF(minor allele) or alt allele as effect allele, which make the OR direction discordant across SNP. The latter in turn make the PRS change a lot.

Quoting the 2009 nature paper "Common polygenic variation contributes to risk of schizophrenia and bipolar disorder": (referring the plink tools)

"Based on the ISC controls, we obtained allele frequency quintiles with the following thresholds (truncated at 2 and 98%): 0.02, 0.136, 0.351, 0.65, 0.863 and 0.98. To assign the risk-increasing, or “scored” allele, we used a Cochran-Mantel-Haenszel analysis that conditions on sample strata"