Excluding genomic regions from LDSC analyses

Anil Ori

unread,

May 9, 2017, 6:54:02 PM5/9/17

to ldsc_users

Hi LDSC team,

A question on what's the best strategy on removing regions, such as HLA and APoE, from my analyses. That is, at which step do you recommend to exclude them. Should this be at the level of LD scores estimation or is exclusion from the summary statistics file or the weights sufficient?

Thanks, Anil

Raymond Walters

unread,

May 10, 2017, 2:13:57 PM5/10/17

to Anil Ori, ldsc_users

Hi Anil,

Depends a bit on the reason for exclusion.

a) The MHC is normally excluded from computing weights. The unusual LD in the MHC is liable to produce outlier LD scores that may substantially influence the regression, plus it’s not unreasonable to hypothesize that the assumptions about polygenicity and neutral genetic drift underlying ldsc might be a poor representation of effects in the region. The precomputed LD score weights provided for download should already reflect this exclusion.

b) For exclusion of loci with an extreme effect sizes (e.g. APoE), exclusion should happen in the summary statistics since necessary exclusions (if any) are going to be different for each phenotype. There’s not a specific recommended method for defining the exclusion. If you have a region for the locus defined by e.g. LD clumping that’s probably sufficient, or you could drop some region (e.g. 1 cM) around the lead SNP.

Also, if ldsc detects highly significant SNPs it suspects are outliers it will by default drop those top SNPs by imposing a threshold for maximum chi-square statistic. Should appear in the log file with the message:

"Removed X SNPs with chi^2 > X (X SNPs remain)”

Cheers,

Raymond

--
You received this message because you are subscribed to the Google Groups "ldsc_users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ldsc_users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ldsc_users/5bc04b1a-ffbc-4a58-b360-cf4703cd9045%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Anil Ori

unread,

May 10, 2017, 4:36:32 PM5/10/17

to ldsc_users, anil.p...@gmail.com

Thanks, Ramon.

So, are SNPs that are excluded from computing the weights also excluded from the regression? I am asking in case when you have a locus with extreme effect that resides in the MHC region (e.g. SCZ). If excluded from the weights, can I assume it is excluded from the regression or should I also remove it from the sum stats?

Best, Anil

Op woensdag 10 mei 2017 11:13:57 UTC-7 schreef Raymond Walters:

Raymond Walters

unread,

May 10, 2017, 6:02:24 PM5/10/17

to Anil Ori, ldsc_users

Hi Anil,

That’s correct, the regression will be computed with just the SNPs that have weights, LD scores, and summary statistics. If any of those three are missing then the SNP will be excluded from the regression.

There’s some nuance to the relationship between the set of SNPs used for the regression vs. the set of SNPs used to compute the LD scores that’s nicely covered in the supplementary material of the ldsc partitioned heritability paper (PMID 26414678, under “Choice of regression SNPs and reference SNPs”). It’s probably worth reading to get some detail on how SNP exclusions in each set affect the interpretation of the results.

Cheers,

Raymond

To view this discussion on the web visit https://groups.google.com/d/msgid/ldsc_users/730e6ab2-7af4-405a-93c9-a5501627b127%40googlegroups.com.

Anil Ori

unread,

May 10, 2017, 6:19:53 PM5/10/17

to ldsc_users, anil.p...@gmail.com

Hi Raymond,

Excellent, thank you for the help!

Best, Anil

Op woensdag 10 mei 2017 15:02:24 UTC-7 schreef Raymond Walters:

Reply all

Reply to author

Forward