LD Score calculation in non-human system, no genetic map

357 views
Skip to first unread message

Alex Gileta

unread,
Aug 8, 2017, 1:23:32 PM8/8/17
to ldsc_users
Hi, 

I am performing a GWAS on a handful of quantitative traits in a large sample of rats, and we are interested in the genetic correlations between traits and how they compare in two disparate populations of rats we are working with. I have genotypes in PLINK format and summary statistics from GEMMA. I would now like to use LDSC to estimate LD scores for a set of SNPs that overlaps between the two populations, but I do not have a genetic map available for my population(s). Is there a recommendation for this situation? The tutorial seems to assume that everyone using this program will be doing so in humans and doesn't offer alternative avenues for model systems. Any suggestions would be appreciated, thank you.

-Alex Gileta

Raymond Walters

unread,
Aug 8, 2017, 1:49:13 PM8/8/17
to Alex Gileta, ldsc_users
Hi Alex,
There have been a handful of questions about non-human systems in the past, but to my knowledge nobody has fully implemented applying ldsc outside of humans.

Without a genetic map, ldsc will let you set the distance to use for computing LD scores either in terms of kilobases (--ld-wind-kb) or number of SNPs (--ld-wind-snps). The goal is to have a distance that fully covers the size of all real LD blocks, but without being overly large to where you’re adding noise by summing r^2 with SNPs that are effectively uncorrelated. What distance is appropriate will be dependent on your organism, and may require a degree of testing/tuning to validate in your data (especially if you’re using inbred rats, where I expect larger and longer-range LD).

Before taking any ldsc results in rats at face value I’d recommend careful thought about the LD score regression model and ideally some simulation work to validate that the model works as intended in non-humans. The LD score regression model and it's strong simplifying assumptions were developed with humans in mind, so it’s important to make sure you’re comfortable making those same assumptions before applying the method elsewhere.

Cheers,
Raymond



--
You received this message because you are subscribed to the Google Groups "ldsc_users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ldsc_users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ldsc_users/fc8a2e34-4889-4347-acef-d88d790bcdff%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Benjamin Neale

unread,
Aug 8, 2017, 1:50:57 PM8/8/17
to Raymond Walters, Alex Gileta, ldsc_users
Tacking onto what Raymond said - it's also the case that cross-line/population rat analyses are going to be pretty tricky because of the mismatch in LD between the populations and that would require some additional development, if I interpret your question correctly.

Best,
Ben

To unsubscribe from this group and stop receiving emails from it, send an email to ldsc_users+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "ldsc_users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ldsc_users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ldsc_users/19BC0351-CB21-4FBE-A06A-7C3C0C0DDD27%40broadinstitute.org.

For more options, visit https://groups.google.com/d/optout.



--
For scheduling matters, please contact Carla Hammond <ham...@atgu.mgh.harvard.edu>

Benjamin Neale, Ph.D.
Assistant Professor
Analytic and Translational Genetics Unit
Massachusetts General Hospital
Institute Member
Stanley Center for Psychiatric Disease and Program in Medical and Population Genetics
Broad Institute

Alex Gileta

unread,
Aug 8, 2017, 3:32:26 PM8/8/17
to ldsc_users, rwal...@broadinstitute.org, alex....@gmail.com
Thank you both for your quick replies. Just to clarify a bit further about the population of rats I'm working on. These are Sprague Dawleys. They are a commercially available, outbred line. The two populations I am referring to are both Sprague Dawley lines, but they come from different vendors (Charles River and Harlan) and have diverged over the years they have been bred separately. I have attached to the post a plot of the mean LD decay in these populations. The blocks are definitely much larger than humans, but smaller than you might see in the HS, DO, AILs, etc. I suppose though that if you are trying to cover the size of all real LD blocks, then you want something like the 95 or 99 percentile for LD decay as a guideline instead, and choose the point where that plateaus? 

I will go ahead and try out a few values for the kilobase and SNP options and see how it goes. I will also consider the assumptions more carefully, thanks.

-Alex



On Tuesday, August 8, 2017 at 12:50:57 PM UTC-5, Benjamin Neale wrote:
Tacking onto what Raymond said - it's also the case that cross-line/population rat analyses are going to be pretty tricky because of the mismatch in LD between the populations and that would require some additional development, if I interpret your question correctly.

Best,
Ben
On Tue, Aug 8, 2017 at 1:49 PM, Raymond Walters <rwal...@broadinstitute.org> wrote:
Hi Alex,
There have been a handful of questions about non-human systems in the past, but to my knowledge nobody has fully implemented applying ldsc outside of humans.

Without a genetic map, ldsc will let you set the distance to use for computing LD scores either in terms of kilobases (--ld-wind-kb) or number of SNPs (--ld-wind-snps). The goal is to have a distance that fully covers the size of all real LD blocks, but without being overly large to where you’re adding noise by summing r^2 with SNPs that are effectively uncorrelated. What distance is appropriate will be dependent on your organism, and may require a degree of testing/tuning to validate in your data (especially if you’re using inbred rats, where I expect larger and longer-range LD).

Before taking any ldsc results in rats at face value I’d recommend careful thought about the LD score regression model and ideally some simulation work to validate that the model works as intended in non-humans. The LD score regression model and it's strong simplifying assumptions were developed with humans in mind, so it’s important to make sure you’re comfortable making those same assumptions before applying the method elsewhere.

Cheers,
Raymond
On Aug 8, 2017, at 1:21 PM, Alex Gileta <alex....@gmail.com> wrote:

Hi, 

I am performing a GWAS on a handful of quantitative traits in a large sample of rats, and we are interested in the genetic correlations between traits and how they compare in two disparate populations of rats we are working with. I have genotypes in PLINK format and summary statistics from GEMMA. I would now like to use LDSC to estimate LD scores for a set of SNPs that overlaps between the two populations, but I do not have a genetic map available for my population(s). Is there a recommendation for this situation? The tutorial seems to assume that everyone using this program will be doing so in humans and doesn't offer alternative avenues for model systems. Any suggestions would be appreciated, thank you.

-Alex Gileta

--
You received this message because you are subscribed to the Google Groups "ldsc_users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ldsc_users+...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "ldsc_users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ldsc_users+...@googlegroups.com.
LD_Decay_plot_SDandHS.pdf

Raymond Walters

unread,
Aug 8, 2017, 7:23:37 PM8/8/17
to Alex Gileta, ldsc_users
Hi Alex,
Using outbred lines is a big help. Differences in LD between your two populations will definitely be an issue though. LD score regression does not work for genetic correlation between samples from populations with different LD structures. You might want to look at popcorn (softwarepaper), which provides a related model specifically to allow for these kinds of population differences.

As far as LD block window size, you definitely want to be out in the tail. For reference, our current recommendation in humans is 1 centimorgan, which averages out to about 1 MB (see panel d for decay at that distance). 

Cheers,
Raymond

<LD_Decay_plot_SDandHS.pdf>

Alex Gileta

unread,
Mar 29, 2019, 12:42:43 AM3/29/19
to ldsc_users
Hello again. 

After a long hiatus, our lab is revisiting this question, but this time with an updated genetic map in rats that we are able to utilize for LD score estimation. The map was created from a different strain than the 2 we am currently using, but we believe it should be a sufficiently close estimator. I have gone ahead, followed the guide, and run the standard --l2 option for ldsc using the SNP BED/BIM/FAM files for both populations. I tried --ld-wind-cm sizes of 1cM, 2cM, 5cM, and 10cM since there is much more extensive LD in these strains. There wasn't much different between the 5-10cM jump, so feel comfortable choosing something in that range. However, I have pasted the ldsc log file summary data below for 5cM. The estimates for population 2 have MAF and L2 having a strong, negative correlation. Why might this happen and is there anything that can be done to fix it? Further, are the number observed from population 1 reasonable?

POPULATION 1 (214k SNPs & Lower LD)
Summary of LD Scores in allChr.allSamps.90DR2.maf01.hweE7.noIBD.CharlesRiverOnly.ldScores_5CM.l2.ldscore.gz
         MAF        L2
mean  0.2267   88.2854
std   0.1489   83.4098
min   0.0101    0.4538
25%   0.0921   43.7627
50%   0.2140   71.3690
75%   0.3570  110.5772
max   0.5000  906.9256

MAF/LD Score Correlation Matrix
        MAF      L2
MAF  1.0000  0.2977
L2   0.2977  1.0000


POPULATION 2 (114k SNPs & Higher LD)
Summary of LD Scores in _________
         MAF        L2
mean  0.1439  112.4337
std   0.1438   91.2623
min   0.0101    0.5603
25%   0.0243   51.9996
50%   0.0747   93.8404
75%   0.2501  146.4080
max   0.5000  634.2415

MAF/LD Score Correlation Matrix
        MAF      L2
MAF  1.0000 -0.1408
L2  -0.1408  1.0000

-Alex
Reply all
Reply to author
Forward
0 new messages