Genomic SEM v0.0.5 commonfactor() Singular Error with HS Rat Data

71 views
Skip to first unread message

Apurva Chitre

unread,
Mar 21, 2024, 4:54:38 PMMar 21
to Genomic SEM Users
I'm encountering an issue with Genomic SEM version 0.0.5. When using the commonfactor(covstruc = LDSCoutput, estimation = "DWLS") function, I receive the following error:

[1] "Running Model"
[1] "Calculating CFI"
Error in solve.default(t(S2.delt_Q_CFI) %*% S2.W_Q_CFI %*% S2.delt_Q_CFI) :
  Lapack routine dgesv: system is exactly singular: U[16,16] = 0


This error doesn't occur in version 0.0.3.

My research involves HS rats, which have wider LD compared to humans. The sample sizes for the traits under analysis range from 629 to 2487. Additionally, the GCTA-derived SNP heritability estimates for these traits range from 0.141 (SE ± 0.049) to 0.382 (SE ± 0.067). Given these specifics, I am looking for feedback on the method's suitability for this particular population.
I have attached the ldsc() output. 
Appreciate any guidance or insights you can provide.

Thank you. 
ldsc_gSEM_locomotor.RData

Elliot Tucker-Drob

unread,
Mar 22, 2024, 10:58:28 AMMar 22
to Apurva Chitre, Genomic SEM Users
I have not used LDSC (or Genomic SEM) with nonhuman data, so I can't provide much guidance on that front. I assume you are following best practices for estimating rG in HS rats and have an appropriate set of lds scores. the Ns certainly seem small by human gwas standards so that is a concern.

Below is how I would go about inspecting the ldsc output to see whether it is sensible to move forward with a Genomic SEM model with these data.

we can get a sense of the rGs by converting the S matrix to an R matrix using cov2cor (you could have requrested standardized ldsc output when you ran it, but it appears that you didn't, so we'll create the R matrix after the fact):

> cov2cor(LDSCoutput$S)
     p50_david_dietz_loco_distance p50_hao_chen_open_field_first_15 p50_shelly_flagel_2014_d1_total_dist
[1,]                     1.0000000                         47.10806                            0.4141109
[2,]                    47.1080599                          1.00000                           60.4659724
[3,]                     0.4141109                         60.46597                            1.0000000
[4,]                     1.4610404                         97.90025                            1.1392513
[5,]                     1.2685243                        160.74589                            1.1426221
[6,]                     0.7605677                         37.00778                            0.8759522
     u01_peter_kalivas_oft_distance_1_first_15 u01_suzanne_mitchell_locomotor_t1_total_distance u01_tom_jhou_locomotor1
[1,]                                 1.4610404                                        1.2685243               0.7605677
[2,]                                97.9002515                                      160.7458944              37.0077836
[3,]                                 1.1392513                                        1.1426221               0.8759522
[4,]                                 1.0000000                                        2.5493044               0.3537181
[5,]                                 2.5493044                                        1.0000000               0.2220304
[6,]                                 0.3537181                                        0.2220304               1.0000000

There appear to be many out-of-bounds estimates, which can stem from very low h2s and/or low power.
The h2s are the diag of S:

> diag(LDSCoutput$S)
[1] 0.21967473984 0.00002519852 0.15855032707 0.02835159616 0.07015541286 0.11715178105

The 2nd and 4th variable are particularly concerning to me here. They also appear to be the biggest offenders with respect to rGs>>1.

Let's look at the SEs of those h2s...

k<-nrow(LDSCoutput$S)
SE<-matrix(0, k, k)
SE[lower.tri(SE,diag=TRUE)] <-sqrt(diag(LDSCoutput$V))
diag(SE)
[1] 0.05769571 0.05163793 0.06338283 0.08752559 0.14561862 0.09478580

Looks like some are rather large, indicating possibly low power. Let's see whether the h2s are significant by inspecting their Z stats.

> diag(LDSCoutput$S)/diag(SE)
[1] 3.8074709510 0.0004879847 2.5014714039 0.3239234991 0.4817750062 1.2359634498

Looks like only phenotypes 1 and 3 have significant h2. So I think power is certainly an issue here. It's hard to model genetic covariance structure if you can't detect significant genetic signal to start with. 

Finally, let's look at the LDSC intercepts.

> diag(LDSCoutput$I)
[1] 1.180277 1.366183 1.077194 1.104384 1.167949 1.060990

Looks like there is some inflation for variable 2 in particular. But many of the others are still a good bit above one too...

To conclude, there are certainly some red flags with respect to the data. It's possible that the implementation of LDSC in HS rats was suboptimal, but also possible that the N just isn't large enough (or that some of the phonetypes aren't appreciably heritable (which can happen if they are measured with lots of error). One thing that could be tried to increase power is HDL. Michel provides a tutorial on that on our github wiki. However, this would require using appropriate LD matrices for the HS rats.

I hope that helps.

Elliot


--
You received this message because you are subscribed to the Google Groups "Genomic SEM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genomic-sem-us...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/genomic-sem-users/9ae293dc-a3a9-4e6e-b28a-9b38ce340d9bn%40googlegroups.com.

Brittany Leger

unread,
Mar 27, 2024, 6:34:09 PMMar 27
to Genomic SEM Users
Hi Elliot, 
I am in the same group as Apurva, and have had similarly poor results running LDSC with other rat datasets (though I did not receive the same matrix multiplication error). However, as part of our pipeline, we use GCTA-GREML to calculate genetic correlation and heritability, which produces much more reasonable results than LDSC. Do you think it is feasible to use the output from REML for genomic SEM, or are there particulars about LDSC and HDL that make them particularly suited to use in genomic SEM?

best,
Brittany Leger

Michel Nivard

unread,
Mar 27, 2024, 6:41:15 PMMar 27
to Brittany Leger, Genomic SEM Users
Hi,

I think it’s fine to use REML results as input, there could be good genetic architecture reasons for reml to work where ldsc doesn’t (especially in non humans).

You’d need to construct a covariance matrix, and a matrix “V” that is the parameter variance-covariance matrix for that covariance matrix. The trick would be to take care to make sure the entries in V are in the right order. 


There is some info here () on mapping entries of the diagonal of V to related entries in S (covariance) you could use that to get you started mapping from the standard errors for entries in S to V.


Best,
Michel


Op wo 27 mrt 2024 om 23:34 schreef 'Brittany Leger' via Genomic SEM Users <genomic-...@googlegroups.com>

Elliot Tucker-Drob

unread,
Mar 27, 2024, 7:21:03 PMMar 27
to Michel Nivard, Brittany Leger, Genomic SEM Users
V contains squared SEs on the diagonal, so that can easily come from GCTA, but I don't know if GCTA provides the covariance matrix of th parameter estimates (the dependencies among estimation errors, which result from sample overlap, and the fact that the same sample contributes to both its h2 and its cov_g estimates with other traits even in the absence of sample overlap) needed to populate V.  I believe that the following version of GREML can produce it:

https://github.com/devlaming/mgreml

As Michel said, you'd need to enter the relevant estimates in V in the correct order that corresponds with the elements of S.

Here below is a bit more on how to determine the order, taken from a previous thread: https://groups.google.com/g/genomic-sem-users/c/yDfFhIbxJ6k/m/FSOywG-cAwAJ

"On page 3 of the wiki, we show how to get the SEs of each element of S:


k<-nrow(LDSCoutput$S)
SE<-matrix(0, k, k)
SE[lower.tri(SE,diag=TRUE)] <-sqrt(diag(LDSCoutput$V))

In other words, the SEs (in V) corresponding to each element of S can
be mapped back to S by placing them into a matrix that is the same
size of S.

We can use this same strategy to simply number each unique element of
S and map it back to the diagonal of V:
k<-nrow(LDSCoutput$S_Stand)
Snum<-matrix(0, k, k)
Snum[lower.tri(Snum,diag=TRUE)] <-1:(k*(k+1)/2)
Snum"

Finally, there are other methods for fitting SEMs to raw data using GREML-type methods. I have not used them personally, but you could check those out.
https://gitlab.gwdg.de/beate.stpourcain/grmsem
https://pubmed.ncbi.nlm.nih.gov/33604756/

Reply all
Reply to author
Forward
0 new messages