SNP p-values in common factor model

14 views
Skip to first unread message

Lindsay Ackerman

unread,
Dec 2, 2025, 7:42:39 AMDec 2
to Genomic SEM Users

Hello! I am estimating a common factor model across a set of three residual variance parameters. The n-hat for this common factor is around 25,000, but according to userGWAS some SNPS have p-values that are implausibly small for this sample size (e.g., < 10^-20). To troubleshoot, I freed all the parameter constraints imposed by userGWAS (setting fix_measurement to false), and this led to p-value estimates that seem more reasonable for the sample size (e.g., 10^-10). However, because best practices for userGWAS are typically to constrain parameters in the measurement model, I was wondering if this is an appropriate course of action before moving forward with these estimates. I am happy to provide model output, code, and a set of 10 SNPs for troubleshooting if helpful. Thank you!

Elliot Tucker-Drob

unread,
Dec 2, 2025, 9:55:25 AM (14 days ago) Dec 2
to Lindsay Ackerman, Genomic SEM Users
It would be useful to see both the conditional model (with SNPs) and the unconditional model, along with the estimates from the unconditional model, the S matrix, I matrix, and the Ns for the contributing GWAS. I'm also not sure what you mean by a set of three residual variance parameters- perhaps you just mean 3 indicators?


On Tue, Dec 2, 2025 at 6:42 AM Lindsay Ackerman <linds....@gmail.com> wrote:

Hello! I am estimating a common factor model across a set of three residual variance parameters. The n-hat for this common factor is around 25,000, but according to userGWAS some SNPS have p-values that are implausibly small for this sample size (e.g., < 10^-20). To troubleshoot, I freed all the parameter constraints imposed by userGWAS (setting fix_measurement to false), and this led to p-value estimates that seem more reasonable for the sample size (e.g., 10^-10). However, because best practices for userGWAS are typically to constrain parameters in the measurement model, I was wondering if this is an appropriate course of action before moving forward with these estimates. I am happy to provide model output, code, and a set of 10 SNPs for troubleshooting if helpful. Thank you!

--
You received this message because you are subscribed to the Google Groups "Genomic SEM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genomic-sem-us...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/genomic-sem-users/7970ab60-5ad9-4b51-8426-9c2353028da8n%40googlegroups.com.

Lindsay Ackerman

unread,
Dec 2, 2025, 3:54:33 PM (14 days ago) Dec 2
to Genomic SEM Users

Thank you for your quick response! To answer your question, we are modeling three lower-order latent factors (AgrSSPV, ConSSPV, EmoSSPV) which load onto the higher-order common factor (SSPV). The three lower-order factors are residual variance parameters in that they represent the residual of a regression (self-report regressed on other-report). I have included a path diagram in the Word doc attached in case it is helpful.

I have also provided in the Word document the S, I, and N matrices from the ldsc output (though the full ldsc output is also attached, ldsc_model_noEO.rds). The Word doc also includes the unconditional and conditional model code. Finally, the unconditional model results are found in OneFac2.rds. Happy to provide anything else!

Lindsay

OneFac2.rds
ldsc_model_noEO.rds
GWAS Info for GenomicSEM Folks.docx

Elliot Tucker-Drob

unread,
Dec 2, 2025, 10:31:34 PM (13 days ago) Dec 2
to Lindsay Ackerman, Genomic SEM Users
Hi Lindsay,

I see what you are trying to get at. I think the issue is that you are specifying SNP effects on the common factor controlling for the other reports on the individual self-report indicators, but you have only allowed the SNP to affect the indicators by way of the factor. Since the SNP has no way to affect the other reports that in turn affect the self-reports, its effects on the other reports are not actually controlled for. I suspect that for SNPs with nonzero effects on other reports (which may be many of the same ones with nonzero effects on self reports), there will likely be a good deal of misfit in this model and that the resulting parameter estimates (and thus their Z stats and p values0 will not be trustworthy.

You could alter your model to allow for SNP effects on the 3 other reports in addition to on the common factor of the self-report residuals. I think this would likely get more at what you are going for. This might be similar to performing GWAS-by-subtraction to remove the other reports from the self reports, one indicator at a time, and then using the resulting sumstats as indicators in a standard common factor GWAS. The GWAS-by-subtraction may not be as trustworthy though given that the Ns for the other reports are on the lower end (and much lower than those on the self reports). A similar model  would be to have the SNP affect a common factor of the other reports (in place of having the other reports correlating with one another) in addition to having it affect the common factor of the self-residuals. You would still have the regressions of the self-indicators on the corresponding other indicators.

I think that freeing the factor loadings was a bit of a red herring. I suspect that the reason that freeing the loadings produced somewhat more reasonable p values is that it gave the misspecified model more flexibility to adapt to data pattern by shifting the measurement model to accommodate some of the model misfit (I would guess that the misfit would still have been high in many of those case but perhaps not as severe as the one with the constrained measurement model).

I hope that helps!

Elliot

Lindsay Ackerman

unread,
Dec 4, 2025, 10:42:57 AM (12 days ago) Dec 4
to Genomic SEM Users
Hi Elliot, 

This is very helpful! Thanks for taking the time to think through this issue and detail this all out. We have a few ideas for how to move forward then. 

Kindly,

Lindsay

Reply all
Reply to author
Forward
0 new messages