Hi,
I am using the 1kg phase 3 data to calculate LDs. After removing duplicate positions and nonbiallelic SNPs, I met another problem - duplicate RS ids.
Plink will report an error message Error: Duplicate ID 'rs11952502'. I found that there are duplicate ids on different chromosomes, for example
10 rs11952502 0 45889392 T C
23 rs11952502 0 75896726 G T
I checked all chromosomes and there are 20+ such SNPs. Do you suggest to remove all such SNPs? Alternatively, is it possible that in PLINK, I can rename the SNPs using chrom:pos?
I have another concern. I input a list of SNPs using SNP ids, and I am trying to output all SNPs within 500k with r2>0.8 and MAF >1%. I can use the following 2 ways.
1. extract all SNPs with >1% MAF. input my SNP list with ID, and use --ld-snps
2. use -- ld-snps directly all on SNPs and pick the ones with MAF > 1%.
My question is will they give the same results? Is it possible that the input SNP MAF <1% in 1kg panel (but>1% in the GWAS panel), so it can not find any LD snps if I do the pruning first?
Thank you very much!