ML vs WLS

241 views
Skip to first unread message

Trine Nielsen

unread,
Nov 4, 2020, 5:34:57 AM11/4/20
to Genomic SEM Users
Hallo again.
I have a question regarding the two estimation methods ML and WLS. I am working with a commonfactor model with 3 phenotypes, one binary phenotype with sample a sample size of ~53,000 and two continuous traits with sample sizes of ~250,000 and ~340,000. I will like the 3 phenotypes to be weighted equally. In the method section of the GenomicSEM article, I have read the following section about the two estimation methods:

“WLS estimation more heavily prioritizes reducing misfit in those cells in the S matrix that are estimated with greater precision. This has the desirable property of potentially decreasing sampling variance of the genomic SEM parameter estimates, which may boost power for SNP discovery and increase polygenic prediction. However, because the precision of cells in the S matrix is contingent on the sample sizes for the contributing univariate GWASs, WLS may produce a solution that is dominated by the patterns of association involving the most well-powered GWASs, and contain substantial local misfit in cells of S that are informed by lower-powered GWASs. In other words, WLS relative to maximum likelihood may more heavily prioritize minimizing sampling variance of the parameter estimates in the so-called variance bias tradeoff. We expect that this will only occur when the model is overidentified (that is, d.f.>0), such that exact fit cannot be obtained, and that divergence in WLS and maximum likelihood estimates will be most pronounced when there is lower sample overlap and the contributing univariate GWASs differ substantially in power. Maximum likelihood estimation may be preferred when the goal is to most evenly weight the contribution of the univariate sample statistics.”  

With the information provided in this section, I thought, the ML estimation would be the right estimation method for my work, but WLS estimation seems to be the norm. Could you maybe clarify the difference between the two methods? Would you recommend WLS or ML?

Best regards Trine      

agro...@gmail.com

unread,
Nov 17, 2020, 11:15:17 AM11/17/20
to Genomic SEM Users
Hi Trine, 

We use WLS as the default as this is going to more heavily favor producing model estimates that more closely match the pieces of the genetic covariance matrix that are estimated with greater precisions (i.e., smaller standard errors typically reflective of larger sample sizes used to produce the GWAS summary statistics used as input). For us, if you have a more precise estimate, that should be more heavily weighted when producing those model estimates. It's important to keep in mind that this does not mean that the model will automatically favor producing larger estimates for the GWAS with larger sample sizes. For example, if the larger GWAS were not very correlated with the smaller GWAS, a common factor model estimated with WLS would more heavily favor producing a factor loading that was smaller for the larger GWAS. I prefer using WLS in this case, either is defensible depending on how you think about what is most appropriate when estimating the model, and it can be informative to run the model using DWLS and ML to see how the factor structure changes. That said, I would not recommend running the GWAS piece using DWLS and ML and would rather just run it on the base factor structure that does not include the effect of individual SNPs. Hope this helps!

Best, 
  Andrew

Reply all
Reply to author
Forward
0 new messages