Hello :)
I've been asked to replicate some analysis from plink2 in R and hoping for guidance to achieve the same results. I have a phenotype (1/2), gender (1/2), country (1,2,3,4) and various SNPs encoded additively (0,1,2).
In plink2 I run the following command
--pfile hq_exsomes --glm --covar-name Sex,Country_of_origin --maf 0.01 -pheno-name Phenotype -out model_output
I obtain good results which match the literature giving me confidence. Log is attached.
In R I've tried to replicate the analysis (similarly filtering with a --maf of 0.01) by performing either a logistic regression or firth regression like
glm(Phenotype ~ SNP + Gender + Country_of_origin, data = model_data, family=binomial(link="logit")) or Firth regression using the logistf() function. Whilst the p-values and effect sizes are similar, they are not the same.
Does plink2 take any additional steps during modelling? For example, my phenotype variable is skewed 3:1 cases:controls, not sure if plink2 does anything to account for that perhaps? Any help appreciated, looked in documentation but couldn't find anything to help..
Thanks for your time and for this great software :)