Problems Calculating Neff

Sarah Colbert

unread,

Oct 5, 2020, 3:40:22 PM10/5/20

to Genomic SEM Users

Hi All,

I have been using the method suggested previously in this google group from Mallard et al. 2019 to calculate the effective sample size for each of my factors. I am using a three factor model with good fit however when I try to calculate the sample size for these factors I am getting very low estimates.

When I run a common factor model with just one factor’s set of indicators and calculate Neff using this method I calculate a normal sample size. This makes me believe that it is not a problem with the traits/summary statistics I am using, but instead a problem with the model? I am confused because running the model without SNP effects I encountered no issues. For both with and without SNP effects the package is not producing any warnings or errors for this model.

I have double checked that my sumstats arguments are correct, so I don’t think that is part of the problem. Additionally, I have looked at Manhattan plots for the summary statistics produced for each factor and didn’t notice any outliers, strange hits, etc.

I am wondering if anyone else has encountered this issue and has some more troubleshooting tips? Would it be bad practice to use a sum of the indicators' sample sizes as the N for a factor?

Thanks for your help.

Best,

Sarah

Michel Nivard

unread,

Oct 5, 2020, 6:08:16 PM10/5/20

to Genomic SEM Users

HI Sarah,

I think we need a bit more info to debug, but lets beging by stating that the Mallard et al. formula is derived to back the N out of sumstats from a standard GWAS or GWAS meta-analysis. This doesn't mean it won't work at all for other GWAS (like factor GWAS) but perhas there are models for which it works less well.

so here are the pieces of info I would like/need:

1 for each factor the LD score intercept (run either classic DSC or feed the factor sumstats back into genomicSEM's ldsc() function and get the intercept from the diagonal of the matrix "I" which is returned by ldsc() )

2. for each of the three factor GWAS the mean chisq (Z^2) statistic for each factor GWAS also easily obtained form the LD score or ldsc() log.

3. the model (without or with SNP) feel free to mask or alter trait names if your not readdy to publically share your emperical results witht he world just yet.

4. Model fit and output for the 3 factor model feel free to maks trait names etc.

Thinks I would like to have but less pressing

a. the above statistics (chisq and intercept) for the traits going in, as well as their N

b. manhattan plots and qq-plots for the traits going in and the factors

I think low effective N could be a function of model misfit at the SNP level but can't say just yet. you can either mailt his to me and Andrew or we can sort it here (which may be informative for others)

Best,

Michel

Sarah Colbert

unread,

Oct 5, 2020, 6:26:37 PM10/5/20

to Genomic SEM Users

Hi Michel,

Thanks for your response, I will get started on gathering that information ASAP. First, however, I have a question. In order to get each factor's LD score intercept (using either LDSC or genomicSEM) I need to first munge the factor summary statistics, correct? What N values do you suggest I put into the munge function for the factors?

Thanks,

Sarah

Sarah Colbert

unread,

Oct 7, 2020, 10:41:10 AM10/7/20

to Genomic SEM Users

Hi Michel,

I have gathered some of the other information while I wait for a response regarding what N to use when I run the factor sumstats through LDSC. I will also work on creating the manhattan and qqplots in the meantime. Here is the information I have so far:

3.) the model without SNP effects:

model <- 'F1 =~ NA*Trait1 + Trait2

F2 =~ NA*Trait3 + Trait4

F3 =~ NA*Trait5 + Trait6

F1~~F3

F2~~F3

F1~~F2

Trait6 ~~ a*Trait6

a > 0.001'

the model with SNP effects:

model <- 'F1 =~ NA*Trait1 + Trait2

F2 =~ NA*Trait3 + Trait4

F3 =~ NA*Trait5 + Trait6

F1~~F3

F2~~F3

F1~~F2

F1~~1*F1

F2~~1*F2

F3~~1*F3

F1~SNP

F2~SNP

F3~SNP

Trait6 ~~ a*Trait6

a > 0.001'

4.) $modelfit

chisq df p_chisq AIC CFI SRMR

df 39.84087 7 1.350207e-06 67.84087 0.9864015 0.05066982

$results

lhs op rhs Unstand_Est Unstand_SE STD_Genotype

1 F1 =~ Trait1 0.295063279 0.0274416632473787 0.858845172

2 F1 =~ Trait2 0.189695648 0.017557277835497 0.802559974

3 F1 ~~ F1 1.000000000 1.000000000

4 F1 ~~ F2 0.430304854 0.0444352144533238 0.430305016

5 F1 ~~ F3 0.004536797 0.0328763831014492 0.004536684

6 F2 =~ Trait3 0.292602852 0.0162099889635073 0.771837770

7 F2 =~ Trait4 0.222662184 0.00894191837042002 0.916305166

8 F2 ~~ F2 1.000000000 1.000000000

9 F2 ~~ F3 0.854149674 0.031667674035896 0.854149560

10 F3 =~ Trait5 0.236838630 0.00896902039457271 0.817443240

11 F3 =~ Trait6 0.224407416 0.00498082427575073 1.011202697

12 F3 ~~ F3 1.000000000 1.000000000

13 Trait1 ~~ Trait1 0.030969446 0.0170805890815375 0.262384136

14 Trait2 ~~ Trait2 0.019883291 0.00753203898272467 0.355898073

15 Trait3 ~~ Trait3 0.058099755 0.0085659161816976 0.404266076

16 Trait4 ~~ Trait4 0.009470686 0.00428082206720382 0.160385161

17 Trait5 ~~ Trait5 0.027851339 0.00403843911325263 0.331786120

STD_Genotype_SE STD_All p_value

1 0.0798748184196213 0.858845530 5.77498857409189e-27

2 0.0742808578550793 0.802559740 3.28136512401416e-27

3 1.000000000 <NA>

4 0.0444351943001873 0.430305016 3.53090364720803e-22

5 0.032876395072153 0.004536684 0.890243868305833

6 0.0427592212938167 0.771837917 7.77924981707172e-73

7 0.0367979766140435 0.916305020 7.26752687614457e-137

8 1.000000000 <NA>

9 0.0316676460653561 0.854149560 3.12502299978823e-160

10 0.0309564106230164 0.817443416 1.15997151367239e-153

11 0.0224441042601742 1.000000000 < 5e-300

12 1.000000000 <NA>

13 0.14471123645876 0.262384355 0.0698107103824013

14 0.134819357492529 0.355897864 0.00829480075601682

15 0.0596030277178741 0.404266230 1.17977534713489e-11

16 0.0724959796244617 0.160385110 0.0269423347190454

17 0.0481087935463484 0.331786262 5.32767188010616e-12

a.)

Trait 1 mean chisq = 1.1834, intercept= 1.008, N = 114,091

Trait 2 mean chisq = 1.2178, intercept = 1.0251, N = 175,163

Trait 3 mean chisq = 1.3546, intercept = 1.0103, N = 313,963

Trait 4 mean chisq = 1.1434, intercept = 1.0055, N = 121,604

Trait 5 mean chisq = 1.1972, intercept = 0.9989, N = 121,604

Trait 6 mean chisq = 1.4472, intercept = 0.9217, N = 537,349

Thanks,

Sarah

Michel Nivard

unread,

Oct 7, 2020, 12:44:53 PM10/7/20

to Genomic SEM Users

the intercept doesn't really move with N use some sensible value like the median or mean of the N's for the individual traits that load on the factor to avoid issues.

Michel

Michel Nivard

unread,

Oct 7, 2020, 1:05:03 PM10/7/20

to Genomic SEM Users

You have 2 indicators per factor, which does require additional identifying contraints to work, so I am unsure whether those are in plcae here, let me look into it.

Michel

Sarah Colbert

unread,

Oct 7, 2020, 2:59:38 PM10/7/20

to Genomic SEM Users

Hi Michel,

Yes, I am using 2 indicator factors. I was under the impression that this model was still properly identified, however I also tried out a model in which I set the indicators' loadings equal to each other (e.g., F1 =~ NA*Trait1 + a*Trait1 + a*Trait2) so that I could be positive that the model was locally identified and was still obtaining low Neff estimates.

Also, here is the other information you asked for:

F1 intercept = 1.0101, mean chisq = 1.3432
F2 intercept = 1.0034, mean chisq = 1.4313

F3 intercept = 0.9434, mean chisq = 1.6114

Sarah

Sarah Colbert

unread,

Oct 7, 2020, 3:08:17 PM10/7/20

to Genomic SEM Users

I forgot to mention that Traits 1,2,3 and 4 are all correlated with each other and Traits 3,4,5, and 6 are all correlated with each other. For this reason I was assuming my model was properly identified since each indicator also correlated with another indicator which did not load onto the same factor.

Michel Nivard

unread,

Oct 7, 2020, 6:10:04 PM10/7/20

to Genomic SEM Users

That makes sense I think. Cant believe I forgot to ask, what are all the effect N's you are getting?

Michel

Sarah Colbert

unread,

Oct 7, 2020, 6:47:01 PM10/7/20

to Genomic SEM Users

Here are the effective N's I have calculated:

F1 N = 12,430.34

F2 N = 52,583

F3 N = 29,407.33

Sarah

agro...@gmail.com

unread,

Oct 8, 2020, 10:46:54 AM10/8/20

to Genomic SEM Users

Hi Sarah,

Thanks very much for providing all of this information. After talking further amongst ourselves we have realized that the equation listed on the GitHub wiki is not going to produce sensible estimates when using unit (residual) variance identification as you do in your multivariate GWAS models when you fix the residual factor variances to 1. I want to start by saying this form of identification tends to be fine for producing sensible GWAS estimates, and the absence of errors/warnings, sensible Manhattan plots, and sensible LDSC univariate intercepts suggests that the multivariate GWAS itself estimated fine. So the issue is just with using these results as input for the effective sample size formula.

The reason for this issue is that the effective sample size equation assumes that you have betas that are standardized with respect to the total variance in the outcome variable. As both the total variance and genetic variance of latent genetic factors in Genomic SEM are undefined, this makes using the effective sample size formula tricky. You could use non-linear constraints to make the total genetic variance (i.e., variance explained by SNP + residual variance = 1) equal to 1. However, these sorts of constraints tend to cause a lot of problems with model convergence. This is also likely to put effective N on a non-intuitive scale since most phenotypes have a SNP-based heritability of ~5%, in which case effective N for the factor will look like 1/20th of that for a normal GWAS because you are, relatively speaking, only explaining a very small portion of the total genetic variance for this factor.

All these things taken together I would say there are two immediate solutions to the problem:

1. You can switch to using unit loading identification for each of your factors and freely estimate the factor variances. In this case, you would interpret the effective sample size of the factor as scaled relative to the heritability of the reference indicator.

2. If effective sample size is more of a litmus test to make sure your multivariate GWAS is running properly, you can rely more on these other follow-ups we’ve discussed (e.g., LDSC univariate intercept), and to get a sense of overall power report mean chi-square. Again, if you don’t want to report effective N (which is totally fine) then you could just use the results you already have.

Really appreciate you working with us on this one and let us know if you have any follow-up questions or issues. We will work on making this all clearer on the wiki now that this issue has been raised.

Best,

Andrew

Sarah Colbert

unread,

Oct 8, 2020, 11:59:32 AM10/8/20

to Genomic SEM Users

Hi Andrew,

Thanks for explaining that all to me, I am glad it is not a problem with the model and more of an incompatibility issue. Since I plan to use the factor sumstats for follow-up analyses and many of the programs require a sample size to be given, I suppose I will go with option 1.

Thank you to you both for all of your help.

Best,

Sarah

Sarah Colbert

unread,

Oct 9, 2020, 9:33:14 AM10/9/20

to Genomic SEM Users

Hi Andrew and Michel,

Sorry, but I have had another question pop up. In the recent pre-print "Multivariate GWAS elucidates the genetic architecture of alcohol consumption and misuse, corrects biases, and reveals novel associations with disease" it looks like the model (figure 1) uses unit variance loading. In the methods section, it is stated: " The effective sample size for each latent factor was estimated using the approach described by Mallard and colleagues (16). "

I am wondering why you were able to calculate effective sample size for this model with unit variance loading however it does not work for my model? Was Neff actually calculated for a different model with freely estimated factor variances that wasn't shown? Or is a model with unit variance loading equal to a model with freely estimated factor variances?

Sorry if I am missing something, I just want to make sure that I am understanding it completely.

Thanks!

Sarah

Elliot Tucker-Drob

unread,

Oct 9, 2020, 9:53:33 AM10/9/20

to Sarah Colbert, Genomic SEM Users

Dear Sarah,

I can not speak specifically to what is in Travis's code for that paper, but I can say with a good deal of certainty that he used unit loading identification. Keep in mind that unit variance identification isn't actually what you are doing when you fix the variance of a dependent variable to 1.0, because you are really fixing its residual variance. That is a strange parameterization because it is only approximately standardized (the total variance is 1+2pq*beta^2), and in practice it has a greater change of resulting in longer run times, local fit minima, and convergence issues. We generally recommend that unit loading identification be used for GWAS within Genomic SEM for these reasons, which are unrelated to effective N. Also note that the Figure in Travis's paper that you are referring to is a standardized solution in which the total genetic variance expectation for each indicator is 1.0, rather than its SNP h2. Given this, I would not interpret that figure as representing the exact parameterization of the model that was fit (there are alternative parameterizations that are statistically equivalent and produce identical standardized solutions).

Elliot

--

Elliot M. Tucker-Drob, Ph.D.
Professor

Department of Psychology
Faculty Research Associate
Population Research Center
The University of Texas at Austin
108 E. Dean Keeton Stop A8000
Austin, TX 78712-0187
tucke...@utexas.edu
www.lifespanlab.com

--
You received this message because you are subscribed to the Google Groups "Genomic SEM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genomic-sem-us...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/genomic-sem-users/91f6e869-692a-4fb1-aba4-b99d8c23efb1n%40googlegroups.com.

Sarah Colbert

unread,

Oct 9, 2020, 9:59:56 AM10/9/20

to Genomic SEM Users

Hi Elliot,

Thank you for explaining that, it seems I was misinterpreting the figure/model. I now understand the difference and will plan on using unit loading identification.

Best,

Sarah

Claire Morrison

unread,

Oct 9, 2020, 5:47:03 PM10/9/20

to Genomic SEM Users

Hi everyone,

Sorry to bring this back up (I saw you are planning on clarifying this on the github wiki so feel free to just tell me to wait it out if this will be answered there!) but I'm running into this issue as well and have a follow-up question. If we are using unit loading, can we use the effective N formula listed on the github as is and 'trust it' in most cases? Or do we need to modify it to reflect the estimated variance of the factor? Or are we just interpreting the effective sample size of the factor as scaled relative to the heritability of the reference indicator and that number should be ok to use downstream?

Thanks!

Claire

Elliot Tucker-Drob

unread,

Oct 9, 2020, 6:50:16 PM10/9/20

to Claire Morrison, Genomic SEM Users

It depends on what you'll be doing with it. In some cases you'll be using it for post-GWAS analyses that, e.g. use N and beta to back out Z or p. In such cases, as long as the betas that you input were the ones used to calculate the effective N (and didn't come from another parameterization) I would expect the results to be the same across unit loading and unit variance parameterizations. It would be prudent to verify this for your specific application (and of course you could change the reference indicator used for unit loading identification to produce a variety of equivalent models).

Alternatively, you might use it for entering your sumstats into LDSC. In this case, the N that you input will alter the heritability estimate. Note, however, that the heritability of a latent Genomic SEM factor is not typically defined, because the environmental factor structure has not been modeled. I would therefore not recommend trying to estimate the heritability of a latent factor. However you can still produce an interpretable LDSC intercept and Z statistic for LDSC slope that are not affected by N (anything sensible should produce the same intercept and LDSC slope, but again, you should verify; also note that LDSC trims out execessively high chi sq SNPs where excessive is relative to N, so if you alter N you may see that more or less SNPs are removed from the LDSC analysis; in our ldsc function the log will report this).

If you are interested in getting an intuitive sense of how much power has been boosted by using a factor model, I would suggest calculating the mean chi square of the sumstats for the factor. You can then take the ratio of the mean chisquare for the factor sumstats to the mean chi square for a single indicator's sumstats to get a sense of relative increase in power. If you are hell-bent on obtaining an effective N to index this relative increase, then I would recommend obtaining effective N from sumstats for a factor that has been scaled relative to unit loading identification. As you say you should then be "interpreting the effective sample size of the factor as scaled relative to the heritability of the reference indicator." To be exact, for a single common factor model, it will be scaled to the relative to the genetic communality of the reference indicator (the heritability of that indicator that is explained by the general factor).

As you anticipated, we are planning to clarify this more on the wiki (no promises on timing), but I think the above addresses your question for now.

--

Elliot M. Tucker-Drob, Ph.D.
Professor

Department of Psychology
Faculty Research Associate
Population Research Center
The University of Texas at Austin
108 E. Dean Keeton Stop A8000
Austin, TX 78712-0187
tucke...@utexas.edu
www.lifespanlab.com

To view this discussion on the web visit https://groups.google.com/d/msgid/genomic-sem-users/810dcc05-82c8-4705-a8a8-c1891eff85fbn%40googlegroups.com.

Claire Morrison

unread,

Oct 14, 2020, 11:32:01 AM10/14/20

to Genomic SEM Users

Thanks Elliot. So far the effective N does seem to be similar across methods of model identification for me. Relatedly, when playing around with unit and variance identification I've noticed that in UserModels without SNP effects equality constraints are applied to both standardized and unstandardized output. I know this is acknowledged after the model runs and just says it is not available in the current version, but I was wondering if I could get some clarity as to what is being doing to impose those constraints across both standardized and unstandardized? Is it recommended to just report unstandardized estimates if using equality constraints? Again, I really appreciate the responses and apologize for all the questions.

Elliot Tucker-Drob

unread,

Oct 14, 2020, 12:17:51 PM10/14/20

to Claire Morrison, Genomic SEM Users

Hi Claire,

I'm not specifically sure I understand what you mean about the equality constraints. perhaps you can share your model and some of the relevant portions of the output that highlight what you are referring to?

Elliot

--

Elliot M. Tucker-Drob, Ph.D.
Professor

Department of Psychology
Faculty Research Associate
Population Research Center
The University of Texas at Austin
108 E. Dean Keeton Stop A8000
Austin, TX 78712-0187
tucke...@utexas.edu
www.lifespanlab.com

To view this discussion on the web visit https://groups.google.com/d/msgid/genomic-sem-users/5e957419-b912-4420-9897-e46459fd95b3n%40googlegroups.com.

Claire Morrison

unread,

Oct 14, 2020, 12:35:54 PM10/14/20

to Genomic SEM Users

Sure! So when I specify:

model<- "

f1=~trait1+trait2

f2=~z*trait3 + z*trait4

f3=~trait5+trait6

f4=~trait7+trait8"

The model runs and then returns this:

The model is identified because of correlations among all traits but f2 correlates least with the other factors so I've found adding the equality constraint on its indicators helps. In the output the unstandardized estimates are equal for trait3 and trait4 but the standardized estimates are also equal for trait3 and trait4, which I would have thought the standardized estimates would differ given slightly different std_geontype_SEs and p values? Here is an abridged version of the output showing f2:

lhs op rhs Unstand_Est Unstand_SE STD_Genotype STD_Genotype_SE STD_All p_value

13 f2 ~~ f2 1.0000000000 1.000000000 1.0000000000 NA

14 f2 =~ trait3 0.2672879257 0.0421250104787937 0.910625919 0.170233819581702 0.9106259183. 2.222631e-10

15 f2 =~ trait4 0.2672879250 0.0414862171938591 0.910625866 0.168272786745973 0.9106264791 1.172794e-10

Elliot Tucker-Drob

unread,

Oct 14, 2020, 12:52:40 PM10/14/20

to Claire Morrison, Genomic SEM Users

Hi Claire,

Please read "A Note on Standardized Output in Genomic SEM" on the wiki. The only difference between the Unstand and STD_Genotype models is whether they are run on the genetic covariance matrix or the genetic correlation matrix. Any constraints on your model will be applied for both. The STD_ALL only reports point estimates from lavaan, and does not produce SEs or p values. The slightly different SEs and p values can probably be discounted as rounding error in the computations, since these are very similar and very significant. Whether to report the standardized or unstandardized estimates (or both) is up to you- it's a matter of what if most relevant to what you are trying to communicate.

I'm not sure about the wisdom in adding an equality constraint in order to obtain higher factor correlations. That may result in some overfitting. But that's a user choice, and not really something related to the software, so I will leave that decision up to you.

--

Elliot M. Tucker-Drob, Ph.D.
Professor

Department of Psychology
Faculty Research Associate
Population Research Center
The University of Texas at Austin
108 E. Dean Keeton Stop A8000
Austin, TX 78712-0187
tucke...@utexas.edu
www.lifespanlab.com

To view this discussion on the web visit https://groups.google.com/d/msgid/genomic-sem-users/7a06262e-2b91-429f-9d9d-cdf00786d677n%40googlegroups.com.

Claire Morrison

unread,

Oct 14, 2020, 1:44:23 PM10/14/20

to Genomic SEM Users

Got it-- sorry if it seemed like that question was repetitive given the note on standardized output, I had read that but was still slightly confused but this helps, thanks again Elliot!

DP Wightman

unread,

Apr 12, 2022, 10:47:19 AM4/12/22

to Genomic SEM Users

Hi all,

Apologies for continuing an old thread. I read through this thread and the section on the wiki about calculating sample sizes for factors and I am still confused as to why my estimated sample size is so low.

I have run a common factor model with 4 traits (relatively low h2) and obtained a sample size estimate of 1582.281. I have included the effective sample sizes of the input traits, their h2s, their rgs, and the model I specified below. t2 had negative residual variance with the default common factor model (I think because of the low sample size) so I specified it to have 0 residual variance.

Is the low N of t3 and the low h2 of the other traits leading to the low N estimate for the common factor?

Thanks in advance,

Doug

Max Effective N
t1: 113888
t2: 80713
t3: 6306.
t4: 125299

H2 of each trait
t1 Total Observed Scale h2: 0.0468 (0.0101)
t2 Total Observed Scale h2: 0.0653 (0.0072)
t3 Total Observed Scale h2: 0.1717 (0.0825)
t4 Total Observed Scale h2: 0.0717 (0.007)

rg

t1 and t2: 0.3324 (0.0763)
t1 and t3: 0.866 (0.1874)
t1 and t4: 0.1062 (0.068)
t2 and t3: 0.5078 (0.1627)
t2 and t4: 0.1495 (0.0692)
t3 and t4: 0.6241 (0.1246)

model

commonfactor.model<-'F1=~ NA*t1 + t2 + t3 + t4
F1~~1*F1
t3 ~~ 0*t3
F1~SNP'

agro...@gmail.com

unread,

Apr 12, 2022, 11:07:53 AM4/12/22

to Genomic SEM Users

Hi Doug,

The formula that we detail on the wiki for calculating the sample size of factors is really sensitive to how the variance of the factor is scaled. This makes sense if you take a step back and think about what the formula is picking up on. Since the variance of the factor is the factor heritability (h2), the sample size you would need to pick up a certain signal changes a lot if that heritability is 30% vs 100%. If you are using the commonfactorGWAS function, then the default is to use unit loading identification, in which case the factor h2 is scaled relative to the factor loadings and h2 of the indicators, and the formula produces a good effective sample size estimate. The main thing you want to avoid is using unit variance identification (i.e., setting the residaul factor variance to 1) when using that effective sample formula; just a note also that the scaling doesn't matter for the actual GWAS results, the ratio of est/SE remains the same. So I would write your model this way, which will instead use unit loading identification, and should produce a more interpretable sample size estimate.

commonfactor.model<-'F1=~ t1 + t2 + t3 + t4
t3 ~~ 0*t3
F1~SNP'

The final related point is that if you do input the multivariate, factor GWAS results back into LD-score regression (LDSC) I wouldn't report or interpret the h2 no matter what scaling you use, but would instead stick with mean chi-square as an indication of signal relative to the univariate traits mean chi-square. I typically only input factor GWAS summary stats to ldsc to look at the ldsc intercept and make sure it's not way above or below 1, indicating something went wrong on the GWAS end.

Best,

Andrew

DP Wightman

unread,

Apr 13, 2022, 11:20:49 AM4/13/22

to Genomic SEM Users

Hi Andrew,

Thank you for your explanation!

If I am understanding correctly, I specified the factor to have a variance of 1 (F1~~1*F1) so the estimates are scaled to assume the factor is 100% heritable, and as such a smaller sample size is required to explain the genetic signal compared to a factor with a lower heritability.

The est and SE will be larger because the factor is more heritable but the est and SE are proportionally larger so the P-value is not affected. The larger SEs in the sample size calculation cause a smaller N_hat.

N_hat<-mean(1/((2*MAF*(1-MAF))*SE^2))

Thanks for your explanation, I think I understand now. I don't intend to use N_hat for anything, however I was curious to know why it was as small as it was.

Cheers,

Doug

agro...@gmail.com

unread,

Apr 21, 2022, 11:25:27 AM4/21/22

to Genomic SEM Users

Hi Doug,

Your explanation sounds spot on to me! I think in some ways the N_hat calculation can be one of those quality control checks that can indicate something went wrong if this value is 100 times bigger or smaller than any of the variables that went into the model (so long as it is scaled relative to the heritability of one of your variables by using unit loading identification as we discussed in this thread). So I think it's great/sensible to calculate it even if you don't need it for any downstream analyses and I'm glad you brought this up on the google group. Let us know if you have any other questions!