Can't find baseline model v1.2 in alkesgroup repository

393 views
Skip to first unread message

R Reynolds

unread,
Jul 30, 2019, 11:00:34 AM7/30/19
to ldsc_users
Hi,

I recently read this readme document (https://data.broadinstitute.org/alkesgroup/LDSCORE/readme_baseline_versions), which recommends that I use baseline v1.2 for identifying critical cell types. I cannot, however, find a v1.2 other than for the EAS population (1000G_Phase3_EAS_baseline_v1.2_ldscores.tgz). Could you point me to where I might find a full 1000G baseline v1.2?

Many thanks in advance,
Regina

Jory Schossau

unread,
Jul 30, 2019, 11:08:59 AM7/30/19
to ldsc_users
Hi Regina,
It's actually in the hg38 folder, so you were close.

Happy hacking!
 - Jory

R Reynolds

unread,
Jul 31, 2019, 7:48:31 AM7/31/19
to ldsc_users
Hi Jory,

Thanks for getting back to me so quickly.

Just to check, does this mean that this baseline model is mapped to GRCh38? If yes, how does this affect running S-LDSC using GWAS summary stats that are mapped to GRCh37? I'm assuming the two are not compatible -- is this correct?

Thanks again,
Regina

Jory Schossau

unread,
Jul 31, 2019, 9:19:12 AM7/31/19
to ldsc_users
True, hg19 and hg38 are incompatible, however I'm told the differences shouldn't be that big concerning summary statistics, so for instance we are using hg19 summary stats with hg38 baseline.
(apropos I'm finishing a test on the difference just this morning)
BUT, you really should be pairing them as you say, so if you have old summary stats, then you need to use the old baseline model.
Most of the world was using the hg19 model - the hg38 model was uploaded just last week, even though the data has been available since 2013.
 - Jory

R Reynolds

unread,
Jul 31, 2019, 11:15:04 AM7/31/19
to ldsc_users
Oh that's interesting! Could you let me know what differences you identify, if any?
Also, when you say old baseline model, do you mean v1.1 (which only has the bug fix on the promoter annotation)? Do you know how big is the difference between v1.1 and v1.2 in terms of partitioned heritability estimates?

Best,
Regina

Jory Schossau

unread,
Jul 31, 2019, 3:51:23 PM7/31/19
to ldsc_users
I'll definitely let you know when I finish in the next day or so.

Yes, that would be the v1.1, and no I don't know how big of a difference that is until I'm done here. But it will be soon.

Jory Schossau

unread,
Jul 31, 2019, 5:26:30 PM7/31/19
to ldsc_users
Okay, so I compared using a standard hg19 sumstats file (PGC.SCZ2) using the hg38 baseline, with doing a liftover of the sumstats file to hg38 along with the hg38 baseline.
The results are so similar I won't even bother looking into comparing v1.1 and v1.2 output. The largest difference lie within the Enrichment-related columns.
However, of course this is only for the annotations I'm using for our study, so there's always a chance since you are using different annotations that your results will be different.
So yeah, you can do a liftover if you really want to (it's part of my automated pipeline I'll be publishing), but it's unnecessary I think. Based on this totally N=1 analysis I'd say you could use the hg38 baseline and hg19 sumstats. Hope that helps :)

mismatched_sumstats_build_version.png

Jory Schossau

unread,
Jul 31, 2019, 5:33:22 PM7/31/19
to ldsc_users
I'm still going to do the 1.1 vs 1.2 analysis; that is the right question to ask. Stay tuned.

Jory Schossau

unread,
Jul 31, 2019, 6:33:25 PM7/31/19
to ldsc_users
This is more believable. There are definitely statistical differences, that spike is for the enrichment for Andersson Enhancer.

hg19_vs_hg38.png

Hope this helps you make an informed decision how to proceed.

R Reynolds

unread,
Aug 1, 2019, 5:52:40 AM8/1/19
to ldsc_users
Thanks so much for sharing this! This is really helpful and much appreciated. Looking at your results, I think you're right re. the major difference being between the two baseline versions v1.1 and v1.2. Definitely confirms to me that I should be working with v1.2. Will probably end up doing a liftover from hg19 to hg38, mostly for consistency, as the difference doesn't seem to be remarkable, as you also said.

Thanks again for this!!

Jory Schossau

unread,
Aug 1, 2019, 9:38:11 AM8/1/19
to ldsc_users
So, let me caution you if I'm understanding you correctly. I'll summarize results from these 2 analyses.
A) performing a liftover of the summary stats changes nothing. (first figure)
B) baselines 1.1 and 1.2 are quite different. (second figure)

Conclusion: Don't bother with a liftover of summary statistics because it doesn't change anything, but do use baseline v1.2.
However, if you can regenerate the summary statistics with hg38, then obviously this is quite preferred!

Good luck,
 - Jory

Jory Schossau

unread,
Aug 1, 2019, 12:13:01 PM8/1/19
to ldsc_users
To be extra clear, here is the difference between hg19 and hg38 using the exact same summary statistics file (pgc.scz2 in this case).
This is comparing the columns of the regressions .results file.

hg19_vs_hg38_col1.png


hg19_vs_hg38_col2.png

hg19_vs_hg38_col3.png

hg19_vs_hg38_col4.png

hg19_vs_hg38_col5.png

hg19_vs_hg38_col6.png


R Reynolds

unread,
Aug 20, 2019, 10:33:08 AM8/20/19
to ldsc_users
Hi Jory,

Apologies for such a delayed response. You understood me correctly. That is, I intend to work with baseline v1.2 and liftover summary stats from hg19 to hg38. The latter, however, is only for consistency (and reviewers who might ask the question) -- as you said, it doesn't really make a difference otherwise.

Thanks again for your help,

Regina
Reply all
Reply to author
Forward
0 new messages