batch query for SNPs at chromosome coordinates

483 views
Skip to first unread message

Ronald Worthington

unread,
Sep 27, 2013, 9:36:31 AM9/27/13
to gen...@soe.ucsc.edu
Is it possible to do a batch query of the SNP database that would return any extant SNPs at the specified coordinate, such as:

input = chr6:31324144-31324144

ouput = dbSNP build 137 rs4997052


Brian Lee

unread,
Sep 27, 2013, 6:59:57 PM9/27/13
to Ronald Worthington, gen...@soe.ucsc.edu

Dear Ron,

Thank you for using the UCSC Genome Browser and your question about pulling SNPs given genomic coordinates.


In that mailing list question, a user is doing the reverse of searching for coordinates given a long list of rs#### Identifiers. How many coordinates are you looking to query?  You could do the same actions by using the Table Browser's "define regions" button, however it would be limited to 1,000 regions per inquiry.

To use the Table Browser, you would select the "Variation and Repeats" group to find the "AllSNPs(137)" track and select the "snp137" table.  Then click the "define regions" button and then add locations by either clicking the "Choose File" button or pasting coordinates in the box.  You will have to modify coordinates that are not greater than one base, that is chr6:31324144-31324144 will have to become chr6:31324144-31324145.  Then you would change the output format to "selected fields from selected tables" and click "get output".  Then you could select just the fields you would like, such as "chrom", "chromStart", "chromEnd" and "name" to "get output" such as:

chr6 31324143 31324144 rs4997052
chr6 31324144 31324145 rs9266150

Another option would be to follow the suggestion of downloading the snp137.txt.gz file if you have many thousands of coordinates and then running a command like:

zcat snp137.txt.gz | grep -Fwf myCoordinates.txt > mySnps.txt

where myCoordinates.txt was a list of **tab** delineated coordinates like:
chr19 7143144 7143144 
chr19 7143562 7143563 
chr19 7143574 7143575

You would then have a line for each matching coordinates in the mySnps.txt database. You could further select out just the rs Identifiers and coordinates with a command like:

awk '{print $2,$3,$4,$5}' mySnps.txt

To get output like:

chr19 7143574 7143575 rs191708249

This strategy will only work if your exact coordinates are listed in the snp137 table, for example, chr6 31324144 31324144, will not find any match.  Whereas a shortened first coordinate entry, "chr6 31324144", will find the match "chr6 31324144 31324145 rs9266150", yet miss a match for "chr6 31324143 31324144 rs4997052".

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genome Bioinformatics Grou




--


Reply all
Reply to author
Forward
0 new messages