chr:position ID for converting VCF to genotype.raw file

2,651 views
Skip to first unread message

farnush farhadi

unread,
Jul 28, 2017, 6:43:03 PM7/28/17
to plink2-users

Hi, 

I am analyzing the imputation results of Michigan Imputation Server. They are in VCF format.  But they do no have the #rs as the SNPs IDs. Instead, SNP location concatenated with SNP chromosome (chr:position) is SNP ID... 

I am using plink to convert the VCF files to 012 genotype files through this: 

plink --vcf chr.vcf.gz --recodeA --out chr.012genotype

Then the genotype files which are in .raw format, also have chr:position as SNP IDs.

Is there any way to convert the chr:position IDs to #rs for SNPs via plink?

Cheers,
Farnush

Christopher Chang

unread,
Jul 28, 2017, 6:56:40 PM7/28/17
to plink2-users
If you have a file containing rsIDs in one column and chr:position(:ref:alt?) in another, and each value is unique, you can use --update-name for this job.

farnush farhadi

unread,
Jul 30, 2017, 8:36:41 PM7/30/17
to plink2-users
Hi, 

Thank you for your reply. No I do not have a column like what you described. How can I get that? Do you know any useful resource for that? 

Cheers,
Farnush 
Message has been deleted

Owen Wilkins

unread,
Aug 31, 2017, 11:15:48 AM8/31/17
to plink2-users
Did anyone ever figure out a way to do this? I am having the same issue....
Message has been deleted

Chris Chang

unread,
Sep 1, 2017, 3:20:55 AM9/1/17
to Jon Chung, plink2-users
fyi, if you want to use "--vcf /dev/stdin", you'll need to stick with plink 1.9.  The v2.0 VCF importer is a lot more powerful, but some of this power comes from making two passes over the input VCF file instead of one.

On Thu, Aug 31, 2017 at 9:04 AM, Jon Chung <jwb...@gmail.com> wrote:
There are many ways to add rsIDs to a VCF file. I'll try and give some concrete example of going from a VCF, annotate with an annotation vcf from dbSNP, and convert to genotype format file using plink. One cool thing to note is that Plink1.9 can take read vcf data from stdin, making it easy to chain programs together. I don't think this works with Plink2 yet.

Download the vcf file for your relevant build from dbsnp.
  1.  GRch37 can be found here ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b149_GRCh37p13/VCF/All_20161121.vcf.gz
  2. Also grab the tabix index

Then use something like snpsift annotate function to annotate a vcf with another vcf:
  • Use `-id` flag to annotate only the ID column. 
java -jar SnpSift.jar annotate \
   
-id \
   
All_20161121.vcf.gz \
    input
.vcf.gz \
| plink \
   
--vcf /dev/stdin \
   
--keep-allele-order \
   
--double-id \
   
--recodeA \
   
--out output.012genotype


Here's an alternative method using bcftools from the Adventures in Bioinformatics and genetics for fun blogs:
  • Also optional step to update the variant ID with ref and alt alleles.
  • If you have multi-allelic sites, it's a good idea to decompose and normalize them using bcftools as in the genetics for fun blog post.
bcftools annotate \
   
--output-type v \
   
--remove ID \
   
--set-id +'%CHROM:%POS:%REF:%ALT' \
    input
.vcf.gz \
| bcftools annotate \
   
--annotations All_20161121.vcf.gz \
    --columns ID \
    --output-type v \

| plink \
   
--vcf /dev/stdin \
   
--keep-allele-order \
   
--double-id \
   
--recodeA \
   
--out output.012genotype

Hope that helps.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
Message has been deleted
0 new messages