Hello PLINK developer,
Thank you so much for developing this tool. I am new to the GWAS field, so it would be much appreciated if you could help me with some confusion on the calculation of the LD matrix.
I was using PAINTOR for fine mapping, as asked by my PI. However, I ran into errors with PAINTOR when using CalcLD_1KG_VCF that cannot be fixed since there is no maintenance on Github.
So I was able to give a try for:
plink --vcf ALL.chr10.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz --r --out resultLDchr10
Question 1: I thought I needed plink2 for --vcf, but when I change plink to plink2, --r is not recognized: Error: Unrecognized flag ('--r').
Question 2: I would like to know how to use plink2-formatted v5a files for LD matrix calculation. For example, I downloaded chr10_phase3.pgen.zst and chr10_phase3.pvar.zst. After decompressing them, I wonder which one I should use, and which flag to use as well (maybe --pgen and --pvar)?
Question 3: I have a list SNPs from a public GWAS summary statistics dataset, should I use --snps to supply them for the LD matrix calculation? And what format should the documents be for --snps?
Question 4: This is a question comparing between PAINTOR and PLINK; there is a flag in CalcLD_1KG_VCF.py: --map_file [-m] specify reference map file that maps population ids to individuals. I wonder if PLINK has a similar flag or if I don't need it in PLINK.
After running the above command, I have a file named resultLDchr10.nosex. Here is a part of the information from .log file:
3992219 variants loaded from .bim file.
2504 people (0 males, 0 females, 2504 ambiguous) loaded from .fam.
Ambiguous sex IDs written to resultLDchr10.nosex .
Using up to 39 threads (change this with --threads).
Before main variant filters, 2504 founders and 0 nonfounders present.
I am not sure if this resultLDchr10.nosex would influence the LD calculation or not.
Thank you!