Hello,
I have been looking through the plink 1.9 documentation and saw it was possible to calculate LD scores between a list of SNPS with the “--ld-snp-list” option and also perform LD-pruning of SNPs using the "--indep" , "--indep-pairwise" and other options. However, is it possible to perform LD pruning for a list of given SNPs?
In the past, I performed LD pruning of SNPs using a VCF file containing genotype information for all chromosomes from the 1000 genomes phase 3 release (
http://www.internationalgenome.org/category/release/). Note the VCF file contained several millions of SNPs and was a very large file (>15 GB).
First I generated plink format files from the VCF:
plink --vcf all.chr.concat.vcf.gz --maf 0.05 --recode --out all.chr.genotypes --keep-autoconv
Then I used the independent pairwise option on .ped/.map files using the command:
plink --file all.chr.genotypes --indep-pariwise 1000 kb 5 0.2 --r2
Which would give me output files of all SNPs in LD that were removed (“prune.out” file) and all SNPS not in LD (“
prune.in” file). I was wondering if it would be possible to obtain similar “
prune.in” and “prune.out” output files while using the “--ld-snp-list” option.
The type of analysis I was thinking of doing was something along the lines of:
plink --file all.chr.genotypes --indep-pariwise 1000 kb 5 0.2 --ld-snp-list Type_2_diabetes.SNPs.txt --r2 --out T2D.LD.SNPs
Where the “Type_2_diabetes.SNPs.txt” contains a list of diabetes GWAS SNP identifiers:
rs864745
rs12779790
rs7961581
rs7578597
…
However, when I run this analysis, it performs the LD pruning considering all SNPs in my .ped/.map files and does not restrict the pruning the SNPs in “Type_2_diabetes.SNPs.txt”.
I would appreciate any help/feedback and please let me know if I can provide more information.
-Nathan