Hello everyone,
I wanted to ask about the odds ratios calculations carried out by plink2 when running it's --glm option on binary data.
I converted a VCF file of about 33-million SNPs and 800~samples of interest into plink2's --pfile format, annotated the psam file with sex and case-control status as the phenotype, and carried out QC using genotype-rate, missingness, and minor allele frequency.
After filtering I split the plink files by sex to analyse males and females separetely and ran the --glm option to run a logistic regression. I also added age as a covariate in a separate .covariate file (the age column was made up of discrete values, not age groups).
After looking at the results, I noted that there were no variants that reached a level of significance above a -log10p value of 6, but when looking at the Odds ratios, a good number of variants had values exceeding the hundreds (e.g., 267, 351, 122, etc.)
I'm curious as to how the odds ratios are calculated in plink2's logistic regression in terms of what method is used to calculate the OR, how the contingency tables are made, and any other relevant details to understand how the calculation is done.
I'd also like to ask if there's a known reason for the discrepancy in significance according to p-values and OR in plink, if there's a way to mitigate this, or if there's a way to increase the power of the association test for the p-values.
Thank you for your time and patience,
Kindest regards,
Luke