LD Calculation - Plink with GWAS SNPs

517 views
Skip to first unread message

Ze Tristan Shi

unread,
Nov 1, 2022, 6:04:12 PM11/1/22
to plink2-users
Hello PLINK developer,

Thank you so much for developing this tool. I am new to the GWAS field, so it would be much appreciated if you could help me with some confusion on the calculation of the LD matrix. 

I was using PAINTOR for fine mapping, as asked by my PI. However, I ran into errors with PAINTOR when using CalcLD_1KG_VCF that cannot be fixed since there is no maintenance on Github. 
So I decided to use Plink for LD. After reading a previous thread (https://groups.google.com/g/plink2-users/c/9vhaoFZfHQ0/m/pxRdGHHiBAAJ), I guess there is something wrong v5b files, but I was able to find v5a somewhere: ALL.chr10.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz. 

So I was able to give a try for: 
plink --vcf ALL.chr10.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz --r --out resultLDchr10

Question 1: I thought I needed plink2 for --vcf, but when I change plink to plink2, --r is not recognized: Error: Unrecognized flag ('--r').

Question 2: I would like to know how to use plink2-formatted v5a files for LD matrix calculation. For example, I downloaded chr10_phase3.pgen.zst and chr10_phase3.pvar.zst. After decompressing them, I wonder which one I should use, and which flag to use as well (maybe --pgen and --pvar)?

Question 3: I have a list SNPs from a public GWAS summary statistics dataset, should I use --snps to supply them for the LD matrix calculation? And what format should the documents be for --snps?

Question 4: This is a question comparing between PAINTOR and PLINK; there is a flag in  CalcLD_1KG_VCF.py:  --map_file [-m] specify reference map file that maps population ids to individuals. I wonder if PLINK has a similar flag or if I don't need it in PLINK.  
After running the above command, I have a file named resultLDchr10.nosex. Here is a part of the information from .log file: 
3992219 variants loaded from .bim file.
2504 people (0 males, 0 females, 2504 ambiguous) loaded from .fam.
Ambiguous sex IDs written to resultLDchr10.nosex .
Using up to 39 threads (change this with --threads).
Before main variant filters, 2504 founders and 0 nonfounders present.

I am not sure if this resultLDchr10.nosex would influence the LD calculation or not. 

Thank you!
Ze


Christopher Chang

unread,
Nov 2, 2022, 12:19:55 PM11/2/22
to plink2-users
1. plink 1.9 does technically have a --vcf flag, but yes, you're usually better off pretending it doesn't exist.  With that said, plink 2.0 is still missing a few commands that are implemented in plink 1.9, and --r is one of these.  The standard workflow here is to (i) downcode to .bed with --make-bed (adding "--max-alleles 2" if necessary) to generate a fileset that plink 1.9 can read, and then use plink 1.9 to execute --r.

2. You use --pfile on a .pgen + .pvar + .psam fileset the same way you use --bfile on a .bed + .bim + .fam fileset.

3. If you only want to compute LD between pairs of SNPs in the file, --extract is the flag you're looking for.  (You can use --extract during the --make-bed operation to export a smaller .bed, you don't need to wait till --r.)

4. The LD calculation does not require sex information, unless chrX or chrY are involved.
Reply all
Reply to author
Forward
0 new messages