Dear Cristina,
Thank you for using the UCSC Genome Browser and your question about using the Table Browser to extract SNPs specific to certain genes.
You are absolutely correct in sharing how the Browser when provided a gene prediction (genePred) format input for an intersection will attempt to only output the exon regions. By inputting a BED format for the regions you can avoid this occurrence, and rather pull all the items between the start and end coordinates.
The quickest way to obtain a BED region for your genes of interest is to use the "knownCanonical" table instead of the "knownGenes" table when creating your custom track which will have one entry for a gene (not multiple transcripts).
Going to the Table Browser make the following selections:
group: Genes and Gene Predictions
track: UCSC Genes
table: knownCanonical (instead of the default knownGene)
region: genome
identifiers: Click "paste list" and paste something like the below:
IL9R
TNF
IRF6
IL2
IL6
IL9
IL10
IFNG
CSF2
RELA
NFKB1
Change "output format:" to "custom track" and "get output" and then "get custom track in table browser."
Now you can follow the steps you were doing before:
group: Variation
track: All SNPs(150)
table: snp150
region: genome
intersection: Custom tracks -> My gene custom -> I choose: '...any overlap...' and click "submit"
You will then get all the SNPs for your gene regions.
I wanted to share there are other useful ways to use the BED regions from your custom output. You can also use them to enter into the Multi-Region tool.
For example, if you output the custom track from the custom track created by the first knownCanonical step, you'll have output like the following:
chr1 209958967 209979520
chr1 206940947 206945839
chr4 103422485 103538459
chr4 123372625 123377650
chr5 131409484 131411863
chr5 135227934 135231516
chr6 31543343 31546112
chr6 31544291 31546112
chr7 22766765 22771621
chrX 155227245 155240482
chrY 59330251 59343488
chr11 65421066 65430443
chr12 68548549 68553521
...
These were the regions that defined the larger snp150 selection, but they can also be used to slice the genome into a views of the different genes. By going to the "View" menu and then selecting "Multi-Region" you can paste these coordinates in the "Enter Custom regions as BED..." box and click the radio button next to that option. You can also click the "Highlight alternating regions" box, that will help emphasize the slicing in a new virtual chromosome of these regions combined.
Here is an example session where this has been done and includes the knownCanonical custom track mentioned above:
http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=brianlee&hgS_otherUserSessionName=hg19.slicedGenome
Note, you can use "d v" to return to the default view, and see the "Multi-Region: Viewing discontinuous regions" video for more information: http://genome.ucsc.edu/training/vids/index.html
You can note too in this session that a new Protein Interactions track is displayed. If you clip into that track you will see our new Gene Interactions graph tool that lets one explore the pathways. Lastly you might also be interested in some of our other tools like the Data Integrator (http://genome.ucsc.edu/cgi-bin/hgIntegrator) that is an advanced form of the Table Browser that allow you to extract regions of tables together, and also we have the Variant Annotation Integrator (http://genome.ucsc.edu/cgi-bin/hgVai) where a custom VCF track or rsIDs or HGVS terms can be used as input and analyzed to give predicted effects.
Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
All the best,
Brian Lee
UCSC Genomics Institute
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CA%2BBcHGVpaq8gshcV-%2Bs7njJD1dfkBagLNwm5MDK%3DD9A9%3DETYFw%40mail.gmail.com.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.
Hi Cristina,
There is an even better solution than using the intersection approach that I just realized that will avoid timeout issues in the Table Browser.
With your regions created from the knownCanonical search of the gene list, you can use that in the "define regions" option under "region:" and you should get all the data for your locations.
Go to the Table Browser and make these selections:
group: Variation
track: All SNPs(150)
table: snp150
region: click "define regions"
Paste in the coordinates acquired by doing the knownCanonical step to generate your gene list of regions:
chr1 209958967 209979520
chr1 206940947 206945839
chr4 103422485 103538459
chr4 123372625 123377650
chr5 131409484 131411863
chr5 135227934 135231516
chr6 31543343 31546112
chr6 31544291 31546112
chr7 22766765 22771621
chrX 155227245 155240482
chrY 59330251 59343488
chr11 65421066 65430443
chr12 68548549 68553521
Click "submit" and set output to "all fields from selected table" if you want to directly access the information or change it to "custom track" and view the results in the browser and "get output".
This works well and is a much better solution than the earlier suggestion to query the whole genome with an intersection which will result in a time out on the Table Browser as the snp150 table has over 234 million entries (in the first example I used snp150Common, which is 16 times smaller at about 15 million entries).
Here is an update to that earlier session with these steps taken to create a new custom track: http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=brianlee&hgS_otherUserSessionName=hg19.slicedGenome
Thank you again for using the UCSC Genome Browser!
All the best,
Brian Lee