Hi PLINK community,
I have a question about how the effect allele is chosen for allelic scoring. I will elaborate my question in the example below:
Say we have SNP1 with alleles C and T. The ORT = 3. The ORC = 1/3.
In this case we can approach it in two ways: T is the effect allele and C is the non-effect allele. This means that T increases risk for disease, and that if we compare to the homozygote of the non-effect allele, C, genotype CC = 1, genotype TC = 3, and genotype TT = 9. Therefore if we score using ln(OR), the three possible betas are: CC = ln(1), TC = ln(3), or TT = 2*ln(3).
In the other case we can approach it as follows: C is the effect allele and T is the non-effect allele. This means that C decreases risk for disease, and that if we compare to the homozygote of the non-effect allele, T, genotype TT = 1, genotype TC = 1/3, and genotype TT = 1/9. Therefore if we score using ln(OR), the three possible betas are TT = ln(1), TC = ln(1/3), or CC = 2*ln(1/3).
In these cases then, we will get the same total number, but with opposite signs for scoring depending on which allele we choose as the effect allele. Therefore, when scoring, if we are summing betas for each SNP, if we choose the effect allele to always increase risk, the number will be a very large positive number, and if we choose the effect allele to always decrease risk, the number will be a very large negative number.
So my question is, when scoring, how do we choose the effect and non-effect allele? Does it depend on the allele frequencies?
Thanks :)