another question about estimating SNP effects on residuals

15 views
Skip to first unread message

Holly Poore

unread,
May 4, 2026, 8:31:21 PM (7 days ago) May 4
to Genomic SEM Users
Hi all - 

I am trying to estimate SNP effects on residual substance use disorders (PAU, CUD, and TUD) net of what they share with externalizing and internalizing. In the example model shown in the syntax and diagram below, I am estimating effects on residual TUD, but I have done this separately for PAU and CUD as well. When I run these summary statistics through FUMA, the results look reasonable (big hits on chr4 for PAU and chr15 for TUD but no other real hits, for example). However, when I use these summary statistics to create PGS, the PGS do not have the phenotype-specific associations I would expect (e.g., residual TUD PGS is not more related to TUD outcomes than it is to PAU or CUD outcomes). In the past, I've estimated effects on residual SUDs in a model like this except I was only residualizing on externalizing (internalizing was not in the model at all). These summary statistics produced PGS that had the phenotype-specific associations I would expect. I'm wondering if there's something about adding a second factor that changes things? Or is there anything about this syntax or model that seems off to you all? Please let me know if my explanation is unclear. 

Thanks very much for your time,
Holly

model <- "EXT =~ 1*smok + adhd + asex + cann + nsex + risk + pau + cud + tud
          INT =~ 1*mdd25 + anx_MA + neuro + ptsd + tud
         
          ptsd ~~ a*ptsd
         
          a > .001
         
          EXT ~~ NA*EXT
          INT ~~ NA*INT
         
          EXT ~ SNP
          INT ~ SNP
          tud ~ SNP"

results <- userGWAS(LDSCoutput,
                    sumstats,
                    estimation = "DWLS",
                    model = model,
                    printwarn = TRUE,
                    sub=c("tud~SNP"),
                    cores=1,
                    toler = 1e-50)

gsem_q.png

Elliot Tucker-Drob

unread,
May 6, 2026, 9:53:28 AM (6 days ago) May 6
to Holly Poore, Genomic SEM Users
Hi Holly,

Your figure doesn't appear to have paths from the SNP to the internalizing and externalizing factors, just to TUD. In such a model, the SNP effect on TUD is not incremental to its effects on INT and EXT (i.e. INT and EXT are not controlled for in the TUD ~ SNP regression). However, your code indicates that these paths are estimated. I always like to check the output to be sure that the parameters that I think my code is estimating are indeed being estimated. But assuming that this the case, and all estimates look sensible given what you know about the data (i.e. the model seems to have appropriately converged), then it does seem that you are doing what you intended.

You mention that you get a hit on chr 15 for TUD, which I assume is the Mr. Big hit for tobacco, and chr 4 for PAU, which I assume is alcohol dehydrogenase for alcohol. It's certainly possible that the substance-specific pathways are isolated to a few of these large effect variants operating through well-understood core pathways, and that the vast majority of the remainign polygenic propensity for different substance use phenotypes operates through a mixture of internalizing and externalizing. It's also possible that after conditioning on EXT and INT, you simply don't have the power to produce a predictive PGI (i.e. there are substance-specific polygenic effects but your PGI is low powered). Did you check the h2SNP of this direct path to TUD? That would give you a clue of how much signal you are workign with.

The other thing that seems a bit strange to me is that you say that you have done this for other disorders: PAU, CUD, TUD. However, your model with a direct effect on TUD only seems to allow for a cross-loading for TUD, but not PAU and CUD. That suggests to me that you are changing the measurement model each time you estimate a direct effect on a different substance use phenotype. That seems a bit strange to me, as it redefines the factor each time. I would think that a sensible approach would be to determine a sensible measurement model to use first, and then add the snp to the model and allow for direct effects (in facto, you can allow for multiple direct effects simultaneously- you just can't do it for all indicators at once). If you don't have an apriori sense of what cross-loadings to allow, you can use a "modification indices" style approach in which you fit the model with simple structure first and then inspect the residuals to determine for which substance use phenotypes a cross loading on INT would be approparite. Yavor Dragostinov took such an approach for his measurement model of the Big Five for the Schwaba et al. ReGPC preprint (see the supplement: 3.5 Genomic SEM Analyses Stratified by Measurement Instrument).

All the best,

Elliot


--
You received this message because you are subscribed to the Google Groups "Genomic SEM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genomic-sem-us...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/genomic-sem-users/88069293-6bb2-48bc-8a52-750ddf25f79en%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages