Sample size for munge function

302 views
Skip to first unread message

Sanghyeon Park

unread,
Dec 28, 2021, 12:36:32 AM12/28/21
to Genomic SEM Users
Hi all,

I am trying to run munge function for the binary traits but confused with the sample size.

In wiki 3, in the introduction, it says for case/control designs, total sample size should be provided and NOT the effective sample size.

But when I read the argument description for the munge function, it says for case/control designs effective sample size has to be specified.

Which one is correct?

Best,

Austin

Elliot Tucker-Drob

unread,
Dec 28, 2021, 12:24:42 PM12/28/21
to Sanghyeon Park, Genomic SEM Users
We have transitioned to recommending that you provide effective sample size and setting the sample prevalence to .5 in LDSC. The introduction to section 3 of the wiki has been edited accordingly.

For single cohort GWAS of binary traits, effect sample size (i.e. the equivalent sample size for a balanced case-control study) is computed as 4(v(1-v))n where v is the true sample prevalence. For summary statistics from a meta-analysis of multiple case-control GWAS, you should use the sum of effective sample sizes.

See the following preprint for further details:

Grotzinger, A. D., de la Fuente, J., Nivard, M. G., & Tucker-Drob, E. M. (2021). Pervasive downward bias in estimates of liability scale heritability in GWAS meta-analysis: A simple solution. medRχivLink (to preprint)



--
You received this message because you are subscribed to the Google Groups "Genomic SEM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genomic-sem-us...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/genomic-sem-users/8950239d-667a-4a73-87f1-308dd655afe0n%40googlegroups.com.

Sanghyeon Park

unread,
Dec 28, 2021, 10:02:24 PM12/28/21
to Genomic SEM Users
Thank you for clarifying.

Best,
Austin

2021년 12월 29일 수요일 오전 2시 24분 42초 UTC+9에 tucke...@gmail.com님이 작성:

Sanghyeon Park

unread,
Dec 29, 2021, 6:29:27 PM12/29/21
to Genomic SEM Users
Hi,

I have a follow-up question regarding effective sample size. I have a binary trait from a single cohort but case/control is unbalanced which result in very small sample prevalence and effective sample size.

So when I generate a common factor, the following message pops up.

"A difference greater than .025 was observed pre- and post-smoothing in the genetic covariance matrix. This reflects a large difference and results should be interpreted with caution!! This can often result from including low powered traits, and you might consider removing those traits from the model. If you are going to run a multivariate GWAS we strongly recommend setting the smooth_check argument to true to check smoothing for each SNP."  

Is there a way to handle with unbalanced case/control GWAS? (I have checked that if I exclude this trait, the message does not appear.)

Best,
Austin

2021년 12월 29일 수요일 오전 2시 24분 42초 UTC+9에 tucke...@gmail.com님이 작성:
We have transitioned to recommending that you provide effective sample size and setting the sample prevalence to .5 in LDSC. The introduction to section 3 of the wiki has been edited accordingly.

Elliot Tucker-Drob

unread,
Dec 29, 2021, 6:56:06 PM12/29/21
to Sanghyeon Park, Genomic SEM Users
The method described in the Grotzinger (2021) preprint that I referred to previously accounts for the unbalanced nature of case-control GWAS. The warning that you are getting is not related to whether the GWAS is balanced or not. It refers to the fact that the empirical genetic covariance matrix is nonpositive definite, and that the closest positive definite matrix to the empirical matrix has some elements that are very different front the original. This typically happens when one or more of the GWASs has low power, such that the h2 and rG estimates involving that GWAS are noisy. Indeed, you mention that the case control trait has a low effective N (i.e. low power), and that  when you remove this trait, the warning goes away. While it is true that a case-control with very few cases has low power, even when the total N is large, it is the low power that is relevant, not the unbalanced nature of the GWAS per se.


Sanghyeon Park

unread,
Dec 30, 2021, 1:14:08 AM12/30/21
to Genomic SEM Users
Thanks for the quick reply!

Best,
Austin

2021년 12월 30일 목요일 오전 8시 56분 6초 UTC+9에 tucke...@gmail.com님이 작성:
Reply all
Reply to author
Forward
0 new messages