Hi All,
I have been using the method suggested previously in this google group from Mallard et al. 2019 to calculate the effective sample size for each of my factors. I am using a three factor model with good fit however when I try to calculate the sample size for these factors I am getting very low estimates.
When I run a common factor model with just one factor’s set of indicators and calculate Neff using this method I calculate a normal sample size. This makes me believe that it is not a problem with the traits/summary statistics I am using, but instead a problem with the model? I am confused because running the model without SNP effects I encountered no issues. For both with and without SNP effects the package is not producing any warnings or errors for this model.
I have double checked that my sumstats arguments are correct, so I don’t think that is part of the problem. Additionally, I have looked at Manhattan plots for the summary statistics produced for each factor and didn’t notice any outliers, strange hits, etc.
I am wondering if anyone else has encountered this issue and has some more troubleshooting tips? Would it be bad practice to use a sum of the indicators' sample sizes as the N for a factor?
Thanks for your help.
Best,
Sarah
3.) the model without SNP effects:
model <- 'F1 =~ NA*Trait1 + Trait2
F2 =~ NA*Trait3 + Trait4
F3 =~ NA*Trait5 + Trait6
F1~~F3
F2~~F3
F1~~F2
Trait6 ~~ a*Trait6
a > 0.001'
the model with SNP effects:
model <- 'F1 =~ NA*Trait1 + Trait2
F2 =~ NA*Trait3 + Trait4
F3 =~ NA*Trait5 + Trait6
F1~~F3
F2~~F3
F1~~F2
F1~~1*F1
F2~~1*F2
F3~~1*F3
F1~SNP
F2~SNP
F3~SNP
Trait6 ~~ a*Trait6
a > 0.001'
4.) $modelfit
chisq df p_chisq AIC CFI SRMR
df 39.84087 7 1.350207e-06 67.84087 0.9864015 0.05066982
$results
lhs op rhs Unstand_Est Unstand_SE STD_Genotype
1 F1 =~ Trait1 0.295063279 0.0274416632473787 0.858845172
2 F1 =~ Trait2 0.189695648 0.017557277835497 0.802559974
3 F1 ~~ F1 1.000000000 1.000000000
4 F1 ~~ F2 0.430304854 0.0444352144533238 0.430305016
5 F1 ~~ F3 0.004536797 0.0328763831014492 0.004536684
6 F2 =~ Trait3 0.292602852 0.0162099889635073 0.771837770
7 F2 =~ Trait4 0.222662184 0.00894191837042002 0.916305166
8 F2 ~~ F2 1.000000000 1.000000000
9 F2 ~~ F3 0.854149674 0.031667674035896 0.854149560
10 F3 =~ Trait5 0.236838630 0.00896902039457271 0.817443240
11 F3 =~ Trait6 0.224407416 0.00498082427575073 1.011202697
12 F3 ~~ F3 1.000000000 1.000000000
13 Trait1 ~~ Trait1 0.030969446 0.0170805890815375 0.262384136
14 Trait2 ~~ Trait2 0.019883291 0.00753203898272467 0.355898073
15 Trait3 ~~ Trait3 0.058099755 0.0085659161816976 0.404266076
16 Trait4 ~~ Trait4 0.009470686 0.00428082206720382 0.160385161
17 Trait5 ~~ Trait5 0.027851339 0.00403843911325263 0.331786120
STD_Genotype_SE STD_All p_value
1 0.0798748184196213 0.858845530 5.77498857409189e-27
2 0.0742808578550793 0.802559740 3.28136512401416e-27
3 1.000000000 <NA>
4 0.0444351943001873 0.430305016 3.53090364720803e-22
5 0.032876395072153 0.004536684 0.890243868305833
6 0.0427592212938167 0.771837917 7.77924981707172e-73
7 0.0367979766140435 0.916305020 7.26752687614457e-137
8 1.000000000 <NA>
9 0.0316676460653561 0.854149560 3.12502299978823e-160
10 0.0309564106230164 0.817443416 1.15997151367239e-153
11 0.0224441042601742 1.000000000 < 5e-300
12 1.000000000 <NA>
13 0.14471123645876 0.262384355 0.0698107103824013
14 0.134819357492529 0.355897864 0.00829480075601682
15 0.0596030277178741 0.404266230 1.17977534713489e-11
16 0.0724959796244617 0.160385110 0.0269423347190454
17 0.0481087935463484 0.331786262 5.32767188010616e-12
a.)
Trait 1 mean chisq = 1.1834, intercept= 1.008, N = 114,091
Trait 2 mean chisq = 1.2178, intercept = 1.0251, N = 175,163
Trait 3 mean chisq = 1.3546, intercept = 1.0103, N = 313,963
Trait 4 mean chisq = 1.1434, intercept = 1.0055, N = 121,604
Trait 5 mean chisq = 1.1972, intercept = 0.9989, N = 121,604
Trait 6 mean chisq = 1.4472, intercept = 0.9217, N = 537,349
Hi Sarah,
Thanks very much for providing all of this information. After talking further amongst ourselves we have realized that the equation listed on the GitHub wiki is not going to produce sensible estimates when using unit (residual) variance identification as you do in your multivariate GWAS models when you fix the residual factor variances to 1. I want to start by saying this form of identification tends to be fine for producing sensible GWAS estimates, and the absence of errors/warnings, sensible Manhattan plots, and sensible LDSC univariate intercepts suggests that the multivariate GWAS itself estimated fine. So the issue is just with using these results as input for the effective sample size formula.
The reason for this issue is that the effective sample size equation assumes that you have betas that are standardized with respect to the total variance in the outcome variable. As both the total variance and genetic variance of latent genetic factors in Genomic SEM are undefined, this makes using the effective sample size formula tricky. You could use non-linear constraints to make the total genetic variance (i.e., variance explained by SNP + residual variance = 1) equal to 1. However, these sorts of constraints tend to cause a lot of problems with model convergence. This is also likely to put effective N on a non-intuitive scale since most phenotypes have a SNP-based heritability of ~5%, in which case effective N for the factor will look like 1/20th of that for a normal GWAS because you are, relatively speaking, only explaining a very small portion of the total genetic variance for this factor.
All these things taken together I would say there are two immediate solutions to the problem:
1. You can switch to using unit loading identification for each of your factors and freely estimate the factor variances. In this case, you would interpret the effective sample size of the factor as scaled relative to the heritability of the reference indicator.
2. If effective sample size is more of a litmus test to make sure your multivariate GWAS is running properly, you can rely more on these other follow-ups we’ve discussed (e.g., LDSC univariate intercept), and to get a sense of overall power report mean chi-square. Again, if you don’t want to report effective N (which is totally fine) then you could just use the results you already have.
Really appreciate you working with us on this one and let us know if you have any follow-up questions or issues. We will work on making this all clearer on the wiki now that this issue has been raised.
Best,
Andrew
--
You received this message because you are subscribed to the Google Groups "Genomic SEM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genomic-sem-us...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/genomic-sem-users/91f6e869-692a-4fb1-aba4-b99d8c23efb1n%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/genomic-sem-users/810dcc05-82c8-4705-a8a8-c1891eff85fbn%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/genomic-sem-users/5e957419-b912-4420-9897-e46459fd95b3n%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/genomic-sem-users/7a06262e-2b91-429f-9d9d-cdf00786d677n%40googlegroups.com.