LDSC, latent factor GWAS, etc

671 views
Skip to first unread message

Johan Zvrskovec

unread,
Jun 25, 2020, 7:10:04 AM6/25/20
to Genomic SEM Users

Hi,

 We have a couple of questions on Genomic SEM we have been wondering about in our group.

 What are the differences between the original LDSC and the multivariate LDSC in Genomic SEM, and how are the new additions used later? It would be great if someone could give us a clarification in addition to how the method is described in the paper. Are there other differences apart from the estimation of the sampling covariances?

For performing latent factor GWAS with Genomic SEM, would it be possible to include automatic checks and warnings when the original dataset effect standard errors are outside of what is expected from their supposed scale? Would it be possible to add the functionality to automatically convert the standard errors between scales?

There are, or were at least, some R-packages that were able to create path diagrams using lavaan results. Would it be possible to make lavaan results corresponding to the Genomic SEM results available as part of the usermodel output?

How would you interpret a latent factor Genomic SEM model where the latent factors are uncorrelated? Would it be viable in any situation to set latent factors as uncorrelated, and how would it be useful?

Best,
Johan and other GSEM-users in the TNG-group

Elliot Tucker-Drob

unread,
Jun 25, 2020, 12:02:51 PM6/25/20
to Johan Zvrskovec, Genomic SEM Users
Hi Johan,
Thanks for the questions and feedback. Please see comments below.
All the best,
Elliot

 What are the differences between the original LDSC and the multivariate LDSC in Genomic SEM, and how are the new additions used later? It would be great if someone could give us a clarification in addition to how the method is described in the paper. Are there other differences apart from the estimation of the sampling covariances?

Apart from some minor differences in QC defaults, the only substantive difference is that multivariate LDSC produces the off-diagonals of V, i.e. the sampling covariances of the estimates, (and also conveniently produces S as a full matrix rather than as a sequence of pairwise rGs). Note that multivariate ldsc function can now produce S and V in standardized forms (i.e. an rG matrix and its associated sampling covariance matrix) using  the argument 'stand'. You can of course verify that results are nearly equivalent by running the same set of sumstats through both programs and comparing the point estimates for h2, rg, and their SEs. We have done this rather extensively.

For performing latent factor GWAS with Genomic SEM, would it be possible to include automatic checks and warnings when the original dataset effect standard errors are outside of what is expected from their supposed scale? Would it be possible to add the functionality to automatically convert the standard errors between scales?

We have automated some detection of whether effect sizes are betas or ORs, but it is difficult to know whether the betas are logistic, OLS, or linear probability model, and whether the SEs are provided for betas or ORs without well-documented sumstats from their original sources. If p, SE, and effect size are provided, you can determine whether the effect size is indeed a regression beta or an OR by examining whether the values are distributed closer to 1 (null for OR) or 0 (null for beta). You can then determine whether the SE is for the beta or OR by examining whether you recapture the reported p value using the beta/SE (or [OR-1]/SE ) ratio with or without converting the SE via the formula we provide in our paper (logistic SE = odds ratio SE/OR). This is something that we may be able to further automate down the road.

Note that we already do convert estimates to and STDY scale (i.e. expected sd difference in Y per 1 unit shift in the effect allele), and we provide capability to convert linear probability model coefficients and their SEs to logistic coefficients and SEs.

There are, or were at least, some R-packages that were able to create path diagrams using lavaan results. Would it be possible to make lavaan results corresponding to the Genomic SEM results available as part of the usermodel output?

We have discussed this and it is something that is of potential interest to us, but may be rather involved on the programming side. I am personally of the opinion that forcing a user to translate input and output manually to a path diagram that they draw themselves helps to ensure that users have developed a higher level of the understanding of the models they are fitting. I also find automatically generated path diagrams to be suboptimal for publications.

How would you interpret a latent factor Genomic SEM model where the latent factors are uncorrelated? Would it be viable in any situation to set latent factors as uncorrelated, and how would it be useful?

That's a difficult question to provide a blanket answer to, as it depends on the features of model as a whole. If fitting a cholesky decomposition, or a GWAS-by-subtraction (https://www.biorxiv.org/content/10.1101/2020.01.14.905794v1), then it is important to specify that latent factors are uncorrelated, so as to properly interpret the downstream latent variables as independent components of vairance, or residuals. If fitting a hierarchical model or bifactor model, the correlations among the indicators is subsumed by the general factor, and the domai-specific factors (or their residuals) should therefore be specified as orthogonal to ensure proper interpretation and model identification. If fitting a correlated factors model with simple structure (i.e. no cross-loadings), then I think it may be overly restrictive to fix the correlations between the factors to 0 by assumption rather than freely estimate them. Generally, this is the sort of question that is more generally relevant to SEM, not just Genomic SEM, so I would advise consulting a textbook, such as Kline (Principles and Practice of Structural Equation Modeling) or Loehlin (Latent Variable Models).

Best,
Johan and other GSEM-users in the TNG-group

--
You received this message because you are subscribed to the Google Groups "Genomic SEM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genomic-sem-us...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/genomic-sem-users/fda07f0c-629c-4650-a715-ee0a3a43ee68n%40googlegroups.com.

Johan Zvrskovec

unread,
Jun 25, 2020, 1:49:49 PM6/25/20
to Genomic SEM Users
Thank you Elliot for these comprehensive answers! I will notify the group.
Kind regards,
Johan

Michel Nivard

unread,
Jun 29, 2020, 10:37:39 AM6/29/20
to Genomic SEM Users
Hi Johan,

let me add my 2cents, I am not didactically opposed to automatic plotting of SEM models. In fact I looked into it and I think the easiest way is for someone to make a function that takes genomicSEM output and feeds it to semplot (http://sachaepskamp.com/semPlot) but I don't have the time for that. Feel free to try and make it work and publishing the code to github, if you do I'll pitch in where I can by awnsering specific questions about how to make genomicSEM work with semPlot.


Best,
Michel

Johan Zvrskovec

unread,
Jul 2, 2020, 12:24:19 PM7/2/20
to Genomic SEM Users
Hi Michel,

I have already tried, unsuccessfully, to expose lavaan object output from Genomic SEM as part of the return object from usermodel for example, to be used directly in semPlot. For some reason semPlot was not compatible with the lavaan object. I can do another try to make it work when I have time. Thanks, I will come come back with questions about the code if I get stuck.

There seems to be a new plotting package available now as well: https://cjvanlissa.github.io/tidySEM/

Best,
Johan

Michel Nivard

unread,
Jul 27, 2020, 3:13:10 PM7/27/20
to Genomic SEM Users
great Idea I got in touch with the tidySEM people and will see about using their function to plot.

Best,
Michel

Nicola Pirastu

unread,
Oct 27, 2020, 4:32:29 AM10/27/20
to Genomic SEM Users
Not sure if this has been solved meanwhile but here is a solution using semPlot
It will take the STD_All as default but you can specify whichever column you want.

semPlotModel_GSEM=function(gsem.object=GWISoutput , est.label="STD_All"){ 
      
       object=gsem.object$results
       object$free=0
       numb=1:length(which(object$op!="~~"))
       object$free[which(object$op!="~~")]=numb
       varNames <- lavaanNames(object, type = "ov")
       factNames <- lavaanNames(object, type = "lv")
       factNames <- factNames[!factNames %in% varNames]
       n <- length(varNames)
       k <- length(factNames)
       if (is.null(object$label)) 
         object$label <- rep("", nrow(object))
       semModel <- new("semPlotModel")
       object$est <- object[,est.label]
       if (is.null(object$group)) 
         object$group <- ""
       semModel@Pars <- data.frame(label = object$label, lhs = ifelse(object$op == 
                                                                        "~" | object$op == "~1", object$rhs, object$lhs), edge = "--", 
                                   rhs = ifelse(object$op == "~" | object$op == "~1", object$lhs, 
                                                object$rhs), est = object$est, std = NA, group = object$group, 
                                   fixed = object$free==0, par = object$free, stringsAsFactors = FALSE)
       semModel@Pars$edge[object$op == "~~"] <- "<->"
       semModel@Pars$edge[object$op == "~*~"] <- "<->"
       semModel@Pars$edge[object$op == "~"] <- "~>"
       semModel@Pars$edge[object$op == "=~"] <- "->"
       semModel@Pars$edge[object$op == "~1"] <- "int"
       semModel@Pars$edge[grepl("\\|", object$op)] <- "|"
       semModel@Thresholds <- semModel@Pars[grepl("\\|", semModel@Pars$edge), 
                                            -(3:4)]
       semModel@Pars <- semModel@Pars[!object$op %in% c(":=", "<", 
                                                        ">", "==", "|", "<", ">"), ]
       semModel@Vars <- data.frame(name = c(varNames, factNames), 
                                   manifest = c(varNames, factNames) %in% varNames, exogenous = NA, 
                                   stringsAsFactors = FALSE)
       semModel@ObsCovs <- list()
       semModel@ImpCovs <- list()
       semModel@Computed <- FALSE
       semModel@Original <- list(object)
       return(semModel)
       
}

Then with 

semPaths(semplot_GSEM(GWISoutput),layout="tree2",whatLabels = "est")

you will get your plot.

I think you can play with most of the other options as well.

Best

Nicola



Michel Nivard

unread,
Oct 27, 2020, 5:53:58 AM10/27/20
to Nicola Pirastu, Genomic SEM Users
Hi Nicola,

That is super useful! Mind if I build it into GenomicSEM? If you want the “credits” you can add it to the github via a Pull request.

Best,
Michel

Op di 27 okt. 2020 om 09:32 schreef Nicola Pirastu <dart...@gmail.com>

Nicola Pirastu

unread,
Oct 27, 2020, 6:01:24 AM10/27/20
to Michel Nivard, Genomic SEM Users
Oh no go ahead I always get confused with GitHub and honestly it didn’t take me much time. 
You should credit semPlot as I basically reversed engineered his code.

Only thing I can’t promise it will work with any model, I don’t see why it should not but I have cheated my way around a few variables (ie. object$free which I have no clue what it is).

Funny part is that I can’t use it as I can’t get semPlot installed on the cluster…

Cheers

Nicola

Johan Zvrskovec

unread,
Nov 23, 2020, 1:54:44 PM11/23/20
to Genomic SEM Users
Thank you for this Nicola! Will definitely look at your solution when I need to plot path diagrams in a while.
Reply all
Reply to author
Forward
0 new messages