Mega vs common-factor MV-GWAS: expected -log10 P difference? (HS rats)

Apurva Chitre

unread,

Jul 16, 2025, 2:59:24 AMJul 16

to Genomic SEM Users

I am analysing a behavioural trait in 7,695 heterogeneous‑stock (HS) rats drawn from six fully independent cohorts (no sample overlap). For HS rats smaller sample sizes are expected to work because of more extensive LD.

LDSC was unstable here (relatedness, long‑range LD). Following Jiang 2024, Bioinformatics, we estimated the S and V matrices with MPH (REML) and supplied them, together with the trait‑level univariate summary statistics, to Genomic SEM v0.0.5.

What we observe

38 independent loci at 5 % genome‑wide α (‑log₁₀P ≥ 5.64)
- 22 unique to the commonfactor MV‑GWAS
- 4 unique to the pooled univariate “mega” GWAS (MLMA‑LOCO in GCTA)
- 5 significant in both MV and mega scans
- 7 seen only in single‑cohort scans (2 of these also overlap an MV or UV hit)
Top P‑values
- Single‑cohort scans: ‑log₁₀P ≈ 2–4
- Mega scan: ‑log₁₀P ≈ 7
- One‑factor MV‑GWAS: ‑log₁₀P ≈ 12–25

My concern / question

Is it expected to see such a dramatic jump in significance (‑log₁₀P up to 25) when moving from the mega scan to the common‑factor MV‑GWAS, even though no single cohort has a very strong hit on its own?

Are there additional sanity checks you would recommend to reassure reviewers that the MV‑only peaks are genuine rather than artefacts ?

Here are some diagnostics:

QSNP heterogeneity : 80 % of MV lead SNPs pass Bonf > 0.05
Sign concordance : 11/30 MV loci show perfect 6‑of‑6 sign agreement (binomial p = 8.7 × 10⁻¹⁰)
Common factor model fit: (χ² = 25.137, df = 9, CFI = 0.902, SRMR = 0.123)
2-factor alternative (biologically less interpretable): (χ² = 13.046, df = 8, CFI = 0.969, SRMR = 0.090)
Handling stratification / relatedness

Every univariate GWAS uses MLMA-LOCO + GRM in GCTA.
The same GRM underlies MPH REML, so S and V already incorporate that correction.
Therefore we set I matrix = identity and GC = “none” in commonfactorGWAS().

## diagonal of S

diag(mph_out$S)

[1] 0.300736 0.308355 0.271685 0.241944 0.473822 0.264885

## Standard errors of those h² estimates

k <- nrow(mph_out$S)

SE <- matrix(0, k, k)

SE[lower.tri(SE, diag = TRUE)] <- sqrt(diag(mph_out$V))

diag(SE)

[1] 0.0330158 0.0367704 0.0442432 0.0756440 0.0818507 0.0717718

diag(mph_out$S)/diag(SE)

[1] 9.108851 8.385957 6.140718 3.198456 5.788857 3.690656

Appreciate any guidance or insights you can provide.

Thank you.

Elliot Tucker-Drob

unread,

Jul 16, 2025, 6:23:21 AMJul 16

to Apurva Chitre, Genomic SEM Users

If the traits are measured in the same individuals you cannot set the I matrix to identity. The off diagonals of I should be nonzero as they index dependencies among estimation errors of the SNP effects that inform the common factor GWAS. Setting the off diagonals to 0 will produce overly narrow SEs (p values biased down). Since you are not using ldsc, you don’t have those estimates. However, you can obtain the needed elements from the phenotypic correlations among the traits. You can meta analyze those phenotypic correlations across cohorts or pool the data and estimate them directly. If there is missing data, you’ll need to account for that as well. See the formula for the cross-trait intercept for multivariate ldsc to see how.

Elliot

--
You received this message because you are subscribed to the Google Groups "Genomic SEM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genomic-sem-us...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/genomic-sem-users/e54f42f7-adba-4a7a-9b8c-87f3f61faa2bn%40googlegroups.com.

Apurva Chitre

unread,

Jul 16, 2025, 12:17:22 PMJul 16

to Elliot Tucker-Drob, Genomic SEM Users

Hi Elliot,

Thanks for the quick feedback.
Just to clarify: the six traits were measured in completely non-overlapping cohorts, so there is zero sample overlap. Because of that we set I = identity and GC = "none" in commonfactorGWAS().

Please let me know if you think any additional adjustment is still needed.

Many thanks for your help!
Best,
Apurva

Elliot Tucker-Drob

unread,

Jul 16, 2025, 2:23:23 PMJul 16

to Apurva Chitre, Genomic SEM Users

Ahh great. In that case that specification should be appropriate as you have described. By mega scan I assume you mean performing a single GWAS on the pooled data. To the extent that heritabilities differ across phenotypes/cohorts, and factor loadings are variable, the common factor GWAS should be expected to be better powered. I don’t have any strong intuitions for you, though, especially since I don’t know much about your phenotypes or your measurement model, plus I don’t routinely work with nonhuman data.

Elliot

Reply all

Reply to author

Forward