Difference of LD score estimates using ldsc and GCTA

535 views
Skip to first unread message

Kangcheng Hou

unread,
Aug 4, 2018, 9:46:47 PM8/4/18
to ldsc_users
Hi, thanks for the amazing software. I tried to use ldsc and GCTA to compute the total SNP-heritability.
The scripts for ldsc is 
for (( chr_i=1; chr_i <= 22; ++chr_i ))
do
    python "$ldsc_dir"/ldsc.py\
        --bfile "$plink_prefix""$chr_i" \
        --l2\
        --ld-wind-kb 100\
        --out "$ldsc_out_dir"/$chr_i
done
and the scripts for gcta is
for (( chr_i=1; chr_i <= 22; ++chr_i ))
do
    ./gcta64 \
        --bfile $plink_prefix"$chr_i" \
        --ld-score \
        --ld-wind 100\
        --ld-rsq-cutoff 0\
        --ld-score-adj \
        --out "$gcta_out_dir"/$chr_i\
        --thread-num 8
done
The output for these two softwares for chromosome 22 is
ldsc
CHR     SNP     BP      L2
22      rs62224609      16051249        14.264
22      rs4965031       16052080        3.546
22      rs139918843     16052684        3.033
22      rs376238049     16052962        13.642
22      rs200777521     16052986        4.478

GCTA
SNP chr bp MAF mean_rsq snp_num max_rsq ldscore
rs62224609 22 16051249 0.0934394 0.117427 85 1.00001 10.9813
rs4965031 22 16052080 0.119284 0.0285953 85 0.999995 3.4306
rs139918843 22 16052684 0.0139165 0.0164603 85 0.713577 2.39912
rs376238049 22 16052962 0.0775348 0.108505 85 1 10.2229
rs200777521 22 16052986 0.0606362 0.0369478 85 1 4.14056
I am wondering why there are differences between the two software, am I using them in a wrong way? Thanks!




Raymond Walters

unread,
Aug 6, 2018, 6:41:08 PM8/6/18
to Kangcheng Hou, ldsc_users
Hi,

I haven't fully investigated how GCTA computes LD scores, but my impression is that there are some minor implementation details that will lead the computed values to different slightly (as you've observed). Those differences include the rsq cutoff (--ld-rsq-cutoff -1 would be closer to ldsc's implementation), the implementation of windowing for the LD score calculations (GCTA splitting the genome into fixed overlapping chunks, vs. ldsc using SNPs within a given distance), and potential differences in the default MAF threshold.

In practice, the impact of these differences on LD score regression should be fairly minor. I'd be curious if you're seeing meaningful differences in h2 estimation off of these two sets of LD scores?

Cheers,
Raymond



--
You received this message because you are subscribed to the Google Groups "ldsc_users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ldsc_users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ldsc_users/7cfb1828-13b3-4eb4-ba0d-bb241c599d80%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kangcheng Hou

unread,
Aug 6, 2018, 6:52:41 PM8/6/18
to ldsc_users
Thanks for your reply!
In fact, I tried to use software ldsc to evaluate the heritability using the LD score computed from ldsc and GCTA respectively and get very different estimates.
Here is the script I use:
# ldsc
python "$src_dir"/ldsc.py \
    --h2 $sumstats_dir/cau_ratio_"$cau_ratio"_hsq_"$hsq"_sim_"$sim_i".txt \
    --ref-ld-chr $ldsc_ldscore_dir/ \
    --w-ld-chr $ldsc_ldscore_dir/ \
    --out "$ldsc_out_dir"/cau_ratio_"$cau_ratio"_hsq_"$hsq"_sim_"$sim_i"
# gcta
python "$src_dir"/ldsc.py \
    --h2 $sumstats_dir/cau_ratio_"$cau_ratio"_hsq_"$hsq"_sim_"$sim_i".txt \
    --ref-ld-chr $gcta_ldscore_dir/ \
    --w-ld-chr $gcta_ldscore_dir/ \
    --out "$gcta_out_dir"/cau_ratio_"$cau_ratio"_hsq_"$hsq"_sim_"$sim_i"
For ldsc, the output log is like
Total Observed scale h2: 0.6521 (0.0229)
Lambda GC: 3.8227
Mean Chi^2: 5.0959
Intercept: 1.2839 (0.0241)
Ratio: 0.0693 (0.0059)
For gcta, the output log is like
Total Observed scale h2: 1.084 (0.0476)
Lambda GC: 3.8227
Mean Chi^2: 5.0959
Intercept: 1.4536 (0.0326)
Ratio: 0.1107 (0.008)
I used the same data for comparison. 

Also, I tried to plot the correlation between the ld scores computed from ldsc and gcta.

在 2018年8月6日星期一 UTC-7下午3:41:08,Raymond Walters写道:

Raymond Walters

unread,
Aug 6, 2018, 7:35:07 PM8/6/18
to Kangcheng Hou, ldsc_users
Interesting! Changing --ld-rsq-cutoff and harmonizing the treatment of MAF (including the M_5_50 calculation, which I'm not sure GCTA outputs?) would be the main places I'd start in determining the source of the difference. 
Cheers,
Raymond



Reply all
Reply to author
Forward
0 new messages