Hi ,
I am running into the following bias resulting from applying LDSC on sumstats obtained from commonly used regression techniques in GWAS
LDSC heritabillity estimates are zero or negative in the input file is the result of a mixed model GWAS in which the SNP is present both a a fixed effect and contributes to the GRM for which I estimate the random effect.
To confirm I:
1. If I simulate a phenotype with h2 = .5 and 10% of SNPs with true effect.
2. ran a linear association in Plink 1.9, munged results and perform LDSC to retrieve h2 = .42
3. on the same simulated phenotype ran a mixed model association in GCTA where all SNPs contribute to the random effect, and retrieve h2 = -0.0809 (0.0675)
I ran this a couple of times, each times re sampling the specific SNPs with true effect, results were consistent across runs. My diagnosis would be that when running MLM with the SNP of intrest in the random effect matrix (GRM), the standard error for high LD SNPs will be pushed up disproportiontely (after all they have many LD buddies in the GRM, and therefore their effect estimate with correlate to the random effect).
If I find the time Ill run some more extensive simulations, to confirm the diagnosis. Any sugestions from the LDSC group would be welcom, perhaps I have overlooked other possible causes of the results I obtain.
If the LDSC team agrees with my diagnosis, this might be something to note somewhere on the FAQ/readme page on the github, as in a GWAMA it needs to be considered at the cohort level, before meta analysis and cannot easily be fixed retrospectively.
Obviously the resulting effects on LDSC estimates are dependent on what proportion of cohorts in a GWAMA run MLM with all SNPs in the background, but in some cases it could amount to appreciable bias.
Best.
Michel Nivard