replicate --glm and --glm genotypic in R

4 views
Skip to first unread message

Luis

unread,
Sep 26, 2025, 7:24:50 PM (24 hours ago) Sep 26
to plink2-users
Hi Chris,

I am trying to get the covariance matrix for the coefficients obtained from GWAS, but I couldn’t find any way to obtain it directly from PLINK (please correct me if I’m wrong).

So I tried extracting the genotype data from a PLINK bfile and running the regression in R. For the additive model, I obtained a similar but not identical result, while for the dominance deviation model, I couldn’t get a comparable result. 

Based on my understanding, --glm simply performs a linear regression, while --glm genotypic also performs a linear regression but adds another covariate that codes heterozygous sites as 1 and homozygous sites as 0. 

I’ve attached the code, results and the log file for the exported allele counts and GWAS for your reference. Could you please let me know where I did wrong? Thanks in advance for your help!


# additive for variant rs75267490 in R
Call: lm(formula = height ~ rs75267490_G..C. + PC1 + PC2 + PC3 + PC4 + PC5 + PC6 + PC7 + PC8 + PC9 + PC10 + BIRTH_YEAR, data = m_df)
Coefficients: Estimate Std. Error t value Pr(>|t|) rs75267490_G..C. 0.13063 0.03879 3.368 0.000758 ***

# additive for variant from rs75267490 from --glm
#CHROM  POS     ID      REF     ALT     PROVISIONAL_REF?        A1      OMITTED A1_FREQ TEST    OBS_CT  BETA    SE      T_STAT  P       ERRCODE
1       954032  rs75267490      C       G       Y       G       C       0.340962        ADD     181912  0.0956024       0.0230831       4.14166 3.4495e-05      .


# domdev for variant rs1291348 in R
Call: lm(formula = height ~ rs1291348_G..A. + rs1291348_HET + PC1 + PC2 + PC3 + PC4 + PC5 + PC6 + PC7 + PC8 + PC9 + PC10 + BIRTH_YEAR, data = m_df) Residuals: Min 1Q Median 3Q Max -188.029 -4.058 0.352 4.812 32.005 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 175.43819 0.02969 5909.009 < 2e-16 *** rs1291348_G..A. -0.21342 0.10288 -2.074 0.038038 * rs1291348_HET 0.28947 0.11642 2.486 0.012905 *

# domdev for variant rs1291348 from --glm genotypic
#CHROM  POS     ID      REF     ALT     PROVISIONAL_REF?        A1      OMITTED A1_FREQ TEST    OBS_CT  BETA    SE      T_OR_F_STAT     P       ERRCODE
12      13091942        rs1291348       A       G       Y       G       A       0.128724        ADD     184130  -0.310339       0.0612034       -5.07062        3.96902e-07     .
12      13091942        rs1291348       A       G       Y       G       A       0.128724        DOMDEV  184130  0.36828 0.0692663       5.31686 1.05694e-07     .
12      13091942        rs1291348       A       G       Y       G       A       0.128724        GENO_2DF        184130  NA      NA      14.706  4.10925e-07     .



20250926_height_sex_diff_dom_snps.log
height.dom.test.log
height.add.test.log
beta_cov.r

Chris Chang

unread,
Sep 26, 2025, 7:53:24 PM (23 hours ago) Sep 26
to Luis, plink2-users
Please provide a toy dataset (could have just a few samples) which exhibit the discrepancy you're talking about, and which your commands/scripts can be executed on.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/plink2-users/5b8ef33c-77c4-446d-b61b-b9a06292d5c8n%40googlegroups.com.

Luis

unread,
Sep 26, 2025, 11:37:18 PM (19 hours ago) Sep 26
to plink2-users

I figured out the problem -- I coded the missing phenotype as “-9” when running the GWAS but didn’t remove those individuals when running the regression in R...

Reply all
Reply to author
Forward
0 new messages