Plink 2.0 Odds ratios calculation when using --glm (logistic regression on binary traits)

303 views
Skip to first unread message

Luke Cassar

unread,
Jul 23, 2024, 8:31:25 AM7/23/24
to plink2-users
Hello everyone,

I wanted to ask about the odds ratios calculations carried out by plink2 when running it's --glm option on binary data.

I converted a VCF file of about 33-million SNPs and 800~samples of interest into plink2's --pfile format, annotated the psam file with sex and case-control status as the phenotype, and carried out QC using genotype-rate, missingness, and minor allele frequency.

After filtering I split the plink files by sex to analyse males and females separetely and ran the --glm option to run a logistic regression. I also added age as a covariate in a separate .covariate file (the age column was made up of discrete values, not age groups).

After looking at the results, I noted that there were no variants that reached a level of significance above a -log10p value of 6, but when looking at the Odds ratios, a good number of variants had values exceeding the hundreds (e.g., 267, 351, 122, etc.) 

I'm curious as to how the odds ratios are calculated in plink2's logistic regression in terms of what method is used to calculate the OR, how the contingency tables are made, and any other relevant details to understand how the calculation is done.

I'd also like to ask if there's a known reason for the discrepancy in significance according to p-values and OR in plink, if there's a way to mitigate this, or if there's a way to increase the power of the association test for the p-values.

Thank you for your time and patience,

Kindest regards,
Luke


Christopher Chang

unread,
Jul 24, 2024, 12:02:57 AM7/24/24
to plink2-users
800 samples usually isn't enough to discover much.  A high odds-ratio is not significant if the standard error ("LOG(OR)_SE") is also too high.

plink2's logistic regression should yield results that are practically identical to R glm() with family=binomial().

Luke Cassar

unread,
Jul 24, 2024, 4:54:21 AM7/24/24
to plink2-users
What would you consider to be a good SE threshold to filter out variants with a very high OR?

Also, would it be better to convert the Odds ratios to log of OR?

Luke Cassar

unread,
Jul 24, 2024, 5:07:00 AM7/24/24
to plink2-users
I'd also like to ask about what other methods you would use to determine if an OR value is significant and how to filter out those which are not, I'm still relatively new to GWAS so I'm trying to learn how to navigate the results PLINK has given me.

Christopher Chang

unread,
Jul 29, 2024, 12:39:52 AM7/29/24
to plink2-users
There's usually nothing else you can do with PLINK here.  The p-values were the right stat to look at, I was just explaining why the high ORs you saw did not yield the significant p-values you expected them to.

Odds-ratio vs. log(OR) is usually just a cosmetic choice.  The exception is when an odds-ratio is extremely large or small -- greater than about 10^308 or smaller than 10^{-308} -- in which case log(OR) may be necessary to enable other programs to read the number.

Reply all
Reply to author
Forward
0 new messages