Reg. Updating SNP ids for SNPs on Illumina HumanOmni2.5M array

331 views
Skip to first unread message

SSwami...@tgen.org

unread,
Jun 7, 2013, 9:26:23 PM6/7/13
to gen...@soe.ucsc.edu
Dear Mam/Sir

We have genotype data for SNPs genotyped on the Illumina HumanOmni2.5M array in PLINK format (hg19/Build 37 coordinates). A majority of SNPs have dbSNPs rs id's, but some of them do not have the dbSNP rs id's and are labelled differently. We would like to impute the data using the 1000 Genomes data and hence would like to update the SNP ids based on the latest dbSNP version.

Could you please tell us if it would be possible to update the SNP ids in our data for the SNPs that do not have rs id's using your website? If so, could you tell us how we should go about it, and if not, are there any other resources?

Thank you for your help. We look forward to your early reply.

Yours sincerely
Shanker Swaminathan

Postdoctoral Fellow
Matthew Huentelman's Lab
TGen

Steve Heitner

unread,
Jun 10, 2013, 6:18:54 PM6/10/13
to SSwami...@tgen.org, gen...@soe.ucsc.edu

Hello, Shanker.

Could you please provide me with some sample lines of your data file which include some lines that do have the correct rs IDs and some that do not?

Please contact us again at gen...@soe.ucsc.edu if you have any further questions.

---
Steve Heitner
UCSC Genome Bioinformatics Group

--
 
 
 

SSwami...@tgen.org

unread,
Jun 10, 2013, 6:33:51 PM6/10/13
to st...@soe.ucsc.edu, gen...@soe.ucsc.edu
Dear Dr. Heitner

Thank you for your reply. Here are some sample lines from the PLINK's map file on chromosome 1:

1 kgp499505 5.8106 4158540 A G
1 rs10915428 5.81189 4158955 C A
1 rs7523426 5.81794 4160904 A G
1 kgp15768097 5.81915 4161297 A G
1 kgp6263858 5.8221 4162248 G A
1 kgp9874326 5.82266 4162427 A G
1 kgp2327296 5.82308 4162563 G A
1 rs12073797 5.83125 4165198 A G
1 kgp15272462 5.83197 4165428 A G
1 kgp652123 5.83381 4166023 A C
1 kgp2921185 5.83626 4166814 G A
1 kgp3307364 5.83778 4167304 A G
1 kgp10493391 5.8441 4169341 A G
1 rs12077262 5.84629 4170048 A G
1 rs12032575 5.8467 4170178 A G
1 rs7548756 5.84692 4170249 C A
1 rs10492944 5.85168 4171786 A G
1 kgp7915723 5.85221 4171956 A G
1 rs10492945 5.85806 4173841 G A

Would it be possible to convert the non-rs id's to rs id's? We would like to impute the data using the 1000 Genomes reference data and have been asked to name all SNP ids with the rs ids.

Thank you
Yours sincerely
Shanker Swaminathan

From: Steve Heitner [st...@soe.ucsc.edu]
Sent: Monday, June 10, 2013 3:18 PM
To: Shanker Swaminathan; gen...@soe.ucsc.edu
Subject: RE: [genome] Reg. Updating SNP ids for SNPs on Illumina HumanOmni2.5M array

Steve Heitner

unread,
Jun 11, 2013, 7:07:56 PM6/11/13
to SSwami...@tgen.org, gen...@soe.ucsc.edu

Hello, Shanker.

It is possible to replace the kgp IDs in your file with rs IDs, but the solution is slightly complicated.  The general strategy is the following:

1. Create a BED file of genomic coordinates from your map file
2. Use the UCSC Table Browser and the BED file created in step 1 to create a new BED file with the proper rs IDs
3. Use the new BED file created in step 2 to identify which rs IDs cross reference with which kgp IDs

From the sample data you have shown me from your map file, it appears that the “1” preceding the SNP ID is the chromosome number.  The last number of the second line is the chromosomal coordinate.  So your first entry is:



1 kgp499505
5.8106 4158540
A G

The corresponding entry in your resulting BED file should be:

chr1 4158539 4158540 kgp499505

Using this convention, the first 3 lines of your BED file should be:

chr1 4158539 4158540 kgp499505
chr1 4158954 4158955 rs10915428
chr1 4160903 4160904 rs7523426

To obtain this BED file, if you are comfortable with writing basic scripts, you can write a script to parse your map file and output the appropriate BED file.  If not, there are tools available at Galaxy (https://main.g2.bx.psu.edu/) that can help you with this.  See specifically the tools under “Text Manipulation”.  Please contact Galaxy support (http://wiki.galaxyproject.org/Support) for any questions related to Galaxy tools.

Once you have obtained this BED file, you will use our Table Browser to obtain the list of SNP IDs corresponding with the regions defined in the BED file.  If you are unfamiliar with the Table Browser, please see the User’s Guide at http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html.

Perform the following steps:

1. Navigate to http://genome.ucsc.edu/cgi-bin/hgTables

2. Select the following options:
Clade: Mammal
Genome: Human
Assembly: Feb. 2009 (GRCh37/hg19)
Group: Variation and Repeats
Track: All SNPs(137)
Table: snp137
Output format: selected fields from primary and related tables

3. On the “region” line, click the “define regions” button.  The only caveat with defining regions is that you are limited to 1,000 regions at a time.

4. Click the “Browse” button to select the BED file you created earlier

5. Click the “submit” button

6. You can enter a filename on the “output file” line or leave it blank to see the results on the screen

7. Click the “get output” button

8. In the “Select Fields from hg19.snp137” section, check the chrom, chromStart, chromEnd and name checkboxes

9. Click the “get output” button

At this point, if you only care about having the correct rs IDs and you don’t care about which rs ID goes with which kgp ID, you are done.  If you want to associate rs IDs with kgp IDs, again, you can either write a custom script to compare the two files or you can use Galaxy tools.  See specifically the “Operate on Genomic Intervals/Join” tool.

Considering the limitation of defining 1,000 regions at a time, if you have an overwhelmingly large number of regions, you may consider only adding the kgp IDs to your initial BED file.  If that is still prohibitive, we may need to consider alternate solutions.



Please contact us again at gen...@soe.ucsc.edu if you have any further questions.

---
Steve Heitner
UCSC Genome Bioinformatics Group

SSwami...@tgen.org

unread,
Jun 11, 2013, 8:27:02 PM6/11/13
to st...@soe.ucsc.edu, gen...@soe.ucsc.edu
Dear Dr. Heitner

Thank you very much for the detailed explanation. I will take a look at this option and will get back to you if I have any questions.

Thank you
Yours sincerely
Shanker Swaminathan

From: Steve Heitner [st...@soe.ucsc.edu]
Sent: Tuesday, June 11, 2013 4:07 PM
Reply all
Reply to author
Forward
0 new messages