SNPs search from a list of genes within certain distance

69 views
Skip to first unread message

Kei Hang Katie Chan

unread,
Mar 21, 2014, 5:23:57 PM3/21/14
to gen...@soe.ucsc.edu
To Whom IT May Concern,

I'm interested in finding all the SNPs that are within my list of genes as well as those within a certain distance up and down stream of my genes e.g. 100kb. May I know if I can do this kind of SNP selection using Table Browser? If so, how can I do so?

Thank you very much!

Katie

Jonathan Casper

unread,
Mar 24, 2014, 8:35:12 PM3/24/14
to Kei Hang Katie Chan, gen...@soe.ucsc.edu

Hello Katie,

Thank you for your question about finding SNPs near your genes of interest. You can do this kind of selection using the UCSC Table Browser, though it will take some extra steps because you also want to examine up- and downstream regions. The easiest way to do this is to start with a list of the positions of your genes in 3 or 4 column BED format (http://genome.ucsc.edu/FAQ/FAQformat.html#format1). You'll then need to pad the start and end positions out by 100000 bases to allow for the up- and downstream regions. If you have a short list of genes, this may be easy enough to do by hand. Otherwise, you can do this to your BED file by running the following Unix command (assuming your BED file is named mybedfile.bed):

awk '{$2 = $2 - 100000; $3 = $3 + 100000; print $0;}' mybedfile.bed

If you are not using Unix computer, you can instead use the text manipulation tools at Galaxy (http://usegalaxy.org) to do something similar.

Once you have a BED file of your gene locations plus the 100kb padding on each end, you can use the Table Browser to intersect that with the SNPs track and get a list of only the overlapping SNPs. The following steps assume you want results for the snp138 track of the hg19 human genome assembly, and that you are interested in results for fewer than 1000 genes. If you are interested in more than 1000 genes, you can either split your list or load it as a custom track and use the Table Browser "intersection" tool to obtain results. More information about intersections is available here: http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#Intersection.

1. Open the Custom Track page at http://genome.ucsc.edu/cgi-bin/hgCustom
2. Click "add custom track" 
3. Upload your BED file and click "Submit" 
4. Click "go to table browser" 
5. Use the following settings:

Clade: Mammal
Genome: Human
Assembly: Feb. 2009 (GRCh37/hg19)
Group: Variation
Track: All SNPs(138)
Table: snp138

6. For the "Region" heading, click the "define regions" button and either upload or paste in your BED file.
7. Click "submit" to return to the main Table Browser page.
8. Choose the output format that you want (I suggest "all fields from selected table") and click "get output".

The result should be a list of all SNPs that overlap your regions of interest.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. Questions sent to that address will be archived in a publicly-accessible forum for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

--
Jonathan Casper
UCSC Genome Bioinformatics Group



--


Reply all
Reply to author
Forward
0 new messages