Hi-
I have generated some genomewide annotations that I would like to incorporate with the "baseline" model in partitioning heritability. I have followed the instructions for generating annotation files and LD scores using 1000G Phase 3 data and HapMap 3 snps found here:
https://github.com/bulik/ldsc/wiki/LD-Score-Estimation-Tutorial
All of this seemed to work until I tried partitioning heritability. I am getting the following error: "ValueError: LD Scores for concatenation must have identical SNP columns."
Looking into the discordance between the two LD score files (baseline and my own), I see that the two files do have some differences. Neither file is a superset of the other but rather, each has some SNPs not included in the other.
For example:
bt <- fread("1000G_EUR_Phase3_baseline/baseline.19.l2.ldscore.gz", data.table = F, stringsAsFactors = F)
rv <- fread("rarevar.1000G.EUR.QC.19.l2.ldscore", data.table = F, stringsAsFactors = F)
head(bt$SNP)
[1] "rs8100066" "rs8105536" "rs2312724" "rs1020382" "rs12459906" "rs11084928"
head(rv$SNP)
[1] "rs8100066" "rs8102615" "rs8105536" "rs2312724" "rs1020382" "rs12459906"
sum(!(bt$SNP %in% rv$SNP))
[1] 2051
sum(!(rv$SNP %in% bt$SNP))
[1] 1533
I used the plink files and hapmap snps recommended in the documentation: 1000G_Phase3_plinkfiles.tgz, hapmap3_snps.tgz.
Any ideas as to why I'm not getting the same snps would be appreciated.
Thanks.