Discrepancy in genetic correlation (rg) estimates between HDL and LDSC using identical summary statistics and effective sample sizes

21 views
Skip to first unread message

Kelvin Supriami

unread,
Jan 7, 2026, 12:18:35 PMJan 7
to Genomic SEM Users

Hi,

I’m encountering a substantial discrepancy in estimated genetic correlations (rg) when comparing HDL and LDSC, despite using the same GWAS summary statistics and downstream GenomicSEM model. I would appreciate guidance on whether this behavior is expected or indicates a setup issue.

Using three binary, non-overlapping disease traits (European ancestry GWAS), I observe markedly different rg estimates depending on whether the genetic covariance matrix is estimated via HDL or LDSC, even though:
• The same summary statistics are used
• Effective sample size conventions are applied when munging the summary statistics
• sample.prev = 0.5 is specified for all traits
• Liability-scale conversion is performed using plausible population prevalences
• The same saturated GenomicSEM correlation model is applied

The SEM layer appears to behave as expected (i.e., it reproduces cov2cor(S)), so the discrepancy seems to originate from differences in the estimated genetic covariance matrix S produced by HDL vs LDSC.


model <- ' lT1 =~ NA*T1 lT2 =~ NA*T2 lT3 =~ NA*T3 T1 ~~ 0*T1 T2 ~~ 0*T2 T3 ~~ 0*T3 T1 ~~ 0*T2 T1 ~~ 0*T3 T2 ~~ 0*T3 lT1 ~~ 1*lT1 lT2 ~~ 1*lT2 lT3 ~~ 1*lT3 lT1 ~~ lT2 lT1 ~~ lT3 lT2 ~~ lT3 '

This model is used only to extract rg (latent–latent correlations), not to test factor structure.

What I observed was:
Using the LDSC-based covariance structure:
• rg estimates are moderate and internally consistent across trait pairs

Using the HDL-based covariance structure:
• rg estimates are substantially larger for two of the three trait pairs
• standard errors are also noticeably larger for those pairs

This pattern is reproducible and persists after confirming that the SEM output matches cov2cor(S) for each method.

The questions I have in mind are:

  1. Is it expected that HDL and LDSC may yield substantially different rg estimates for the same traits, even when effective sample sizes and liability-scale conversion are handled consistently?
  2. Does the use of sample.prev = 0.5 interact differently with HDL vs LDSC for meta-analyzed binary traits?
  3. Are there recommended best practices (e.g., SNP filtering, LD reference harmonization, prevalence specification) to improve comparability between HDL- and LDSC-derived covariance matrices?
  4. In cases of disagreement, is one estimator generally preferred for downstream GenomicSEM modeling?

Thanks very much!

Screenshot 2026-01-05 at 7.23.14 PM.png
Screenshot 2026-01-05 at 7.22.18 PM.png
Reply all
Reply to author
Forward
0 new messages