Get rs ids using chr coordinates

280 views
Skip to first unread message

Mahantesh Biradar

unread,
Aug 12, 2020, 11:32:12 AM8/12/20
to UCSC Genome Browser Discussion List
Hello UCSC team

I have: CHR START END Effect_Allele Ref_Allele columns (hg38)

I want:  CHR START END Effect_Allele Ref_Allele RS_ID

I would like to get the SNP rs id using chromosome coordinates, please let me know how I can do this?

Thanks in Advance
Mahantesh 

Brian Lee

unread,
Aug 14, 2020, 4:28:11 PM8/14/20
to Mahantesh Biradar, UCSC Genome Browser Discussion List

Dear Mahantesh,

Thank you for using the UCSC Genome Browser and your question about accessing RS_IDs for coordinate ranges.

These data are stored in a format called bigBed and we have a tool where you can input regions and extract lines of data from the bigBed called bigBedToBed.  In this FAQ, http://genome.ucsc.edu/FAQ/FAQdownloads.html#snp, you can see an example like this one to extract data in the region defined as chr1 200000 200200:

bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153.bb -chrom=chr1 -start=200000 -end=200200 stdout
chr1    200086    200087    rs1553131378    G    1    C,    0    0                2153    snv    rareSome,rareAll,    39816043134    53
chr1    200148    200149    rs1445446228    C    1    T,    0    0                2153    snv    rareSome,rareAll,    34583861530    60

The FAQ explains that you can obtain the bigBedToBed utility and other tools here: http://hgdownload.soe.ucsc.edu/admin/exe/

There are also ways to extract this information in the website using our Data Integrator Tool: http://genome.ucsc.edu/goldenPath/help/hgIntegratorHelp.html

Given you have input like yours (CHR START END Effect_Allele Ref_Allele columns) you can create a custom track:

track name=myInputRegions
chr1    200086    200087 info_unique1
chr1    200148    200149 info_unique2
chr1    200235    200241 info_unique3

You can go to the Data Integrator and set the region to check as the entire genome (to speed processes you could limit it to the region where your custom track has data).  Add this custom track as the first item and then add the variation track dbSNP153, ensuring to pick the view "Variants" and subtrack "All dbSNP".  Then under Output Options, you can click Choose Fields.  Then for the dbSnp153 table clear all selections and only click on "name" for the "dbSNP Reference SNP (rs) identifier."  This will result in output like this:

# hgIntegrator: database=hg38 region=genome Fri Aug 14 10:52:02 2020
#ct_myInputRegions_6998.chrom    ct_myInputRegions_6998.chromStart    ct_myInputRegions_6998.chromEnd    ct_myInputRegions_6998.name    dbSnp153.name
chr1    200086    200087    info_unique1    rs1553131378
chr1    200148    200149    info_unique2    rs1445446228
chr1    200235    200241    info_unique3    rs1303983103
chr1    200235    200241    info_unique3    rs1373116750

Here is a session link to see this in operation where you can click "get output": https://genome.ucsc.edu/cgi-bin/hgIntegrator?hgS_doOtherUser=submit&hgS_otherUserName=brianlee&hgS_otherUserSessionName=RS_IDs_customRegions

Thank you again for your inquiry and for using the UCSC Genome Browser. If you have any further public questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAC_HbnoonRro9Sfiy8vmmAcoMg-g7_WyGjqqjOOYpoOCrBp9EQ%40mail.gmail.com.


--
Reply all
Reply to author
Forward
0 new messages