There are many ways to add rsIDs to a VCF file. I'll try and give some concrete example of going from a VCF, annotate with an annotation vcf from dbSNP, and convert to genotype format file using plink. One cool thing to note is that Plink1.9 can take read vcf data from stdin, making it easy to chain programs together. I don't think this works with Plink2 yet.Download the vcf file for your relevant build from dbsnp.
- GRch37 can be found here ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b149_GRCh37p13/VCF/All_20161121.vcf.gz
- Also grab the tabix index
Then use something like snpsift annotate function to annotate a vcf with another vcf:
- Use `-id` flag to annotate only the ID column.
java -jar SnpSift.jar annotate \
-id \
All_20161121.vcf.gz \
input.vcf.gz \
| plink \
--vcf /dev/stdin \
--keep-allele-order \
--double-id \
--recodeA \
--out output.012genotype
Here's an alternative method using bcftools from the Adventures in Bioinformatics and genetics for fun blogs:
- Also optional step to update the variant ID with ref and alt alleles.
- If you have multi-allelic sites, it's a good idea to decompose and normalize them using bcftools as in the genetics for fun blog post.
bcftools annotate \
--output-type v \
--remove ID \
--set-id +'%CHROM:%POS:%REF:%ALT' \
input.vcf.gz \
| bcftools annotate \
--annotationsAll_20161121.vcf.gz \
--columns ID \
--output-type v \| plink \
--vcf /dev/stdin \
--keep-allele-order \
--double-id \
--recodeA \
--out output.012genotype
Hope that helps.
--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.