Creating a latent polygenic risk score from two PRSs using SEM

48 views
Skip to first unread message

ahmad valikhani

unread,
Apr 23, 2025, 8:06:52 AMApr 23
to Genomic SEM Users
Hi,
I have a question regarding the possibility of creating a latent polygenic risk score (PRS) from two already calculated PRS variables, without using Genomic SEM.

I’m working on a mediation model using structural equation modelling (SEM), where a latent polygenic risk factor (PG_AB) predicts latent phenotypes (A and B), which in turn predict another latent phenotype (C), and ultimately a final latent outcome (Y).

My question is whether it is appropriate to define a latent polygenic risk factor (PG_AB) from two individual PRS variables (calculated using PLINK) without relying on Genomic SEM, using a syntax like the following in lavaan:

model <- '
  PG_AB =~ PG_A + PG_B
'
Would this be considered methodologically sound in the context of SEM, or would you recommend an alternative approach?

Thank you in advance for your guidance!
Best wishes,
Ahmad

Elliot Tucker-Drob

unread,
Apr 23, 2025, 8:38:03 AMApr 23
to ahmad valikhani, Genomic SEM Users
It would be biased if the polygenic scores are based on samples that overlap. If they are fully independent, then yes, this can be sensible if the goal is to use the latent variable to predict an external outcome. However, it would be most sensible to use at least 3 indicators to identify the model without making the assumption that the PGIs equally represent the latent factor (the loading will be affected both by the "true" loading of the genetic component and by the amount of error int he score, which is itself affected by power of the GWAS discovery sample). See the following for one such application:
Tucker-Drob, E. M. (2017). Measurement error correction of genome-wide polygenic scores in prediction samples. bioRχiv. Link (to preprint)

--
You received this message because you are subscribed to the Google Groups "Genomic SEM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genomic-sem-us...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/genomic-sem-users/ac5cdadb-a1d3-4609-91f6-70d24d329043n%40googlegroups.com.

ahmad valikhani

unread,
Apr 23, 2025, 9:08:09 AMApr 23
to Genomic SEM Users
Thank you so much for your response.

I wasn’t quite sure what you meant by “overlapping sample.” Just to clarify, I created PRS for autism and ADHD using different GWAS datasets, one for autism and one for ADHD, but both were applied to the same study sample. So, while the GWAS sources were different, the target sample was the same.

Best wishes,
Ahmad

Elliot Tucker-Drob

unread,
Apr 23, 2025, 9:25:31 AMApr 23
to ahmad valikhani, Genomic SEM Users
overlap in the discovery GWAS. Autism cases could also be ADHD cases and contribute to both GWAS, for example, or the controls could be shared. You can empirically estimate whether this is a concern by examining the off-diagonal of the I matrix from ldsc() of the two contributing GWAS, and ensuring that it is very close to 0. 

ahmad valikhani

unread,
Apr 23, 2025, 9:48:20 AMApr 23
to Genomic SEM Users
Thank you for the clarification.

How about including both PRSs as two observed variables in the SEM model and allowing them to correlate? I assume this approach makes no assumptions compared to modelling a latent factor.

Best wishes,
Ahmad

Elliot Tucker-Drob

unread,
Apr 23, 2025, 10:03:17 AMApr 23
to ahmad valikhani, Genomic SEM Users
we're getting pretty far afield from Genomic SEM at this point. but you can certainly put two PGIs in a regression and treat them as correlated predictors. Note that the PGIs have measurement error (and do not index rare genetic variation), so controlling for a PGI does not fully control for the genetic variation in the phenotype that it indexes.

Reply all
Reply to author
Forward
0 new messages