from BLAT to genome browser

10 views
Skip to first unread message

Guinevere Q. Lee

unread,
Oct 5, 2017, 1:53:21 PM10/5/17
to gen...@soe.ucsc.edu
Dear Genome Browser team,

I used BLAT to locate my sample to "chr 13 minus 67485903".  My next task is to find out what this region is.  Is it a gene?  exon? intron?  enhancer?  CpG island? With the chr number and coordinate, do you have any suggestion on how I should proceed?

What about batch processing?  (eg. if I have 1000 samples with such BLAT-derived hg38 coordinates)

Thank you very much!

Best,
Guin


Guinevere Q. Lee, Ph.D. HIV Molecular Virology
The Ragon Institute of MGH, MIT and Harvard
400 Technology Square, 
Cambridge, MA 02139
USA




Brian Lee

unread,
Oct 6, 2017, 7:44:56 PM10/6/17
to Guinevere Q. Lee, gen...@soe.ucsc.edu

Dear Guin,

Thank you for using the UCSC Genome Browser and your question about moving from BLAT results to exploring results on tracks.

Are you by chance looking at SNP data? There is a second part of my answer below that might be of more interest where we have a Variant Annotation Interrogator (VAI) tool that might be more of interest.

With the concept that you have BLAT results here are some steps you could take. The first step might be to set up the browser to display the tracks you are most interested in viewing (CpG island track, segmentation data tracks for enhancers, gene track, ect...). Then you would BLAT your DNA and when you come to the page with results, click the button that says "Build a custom track with these results" and you will see normal results.

We will take this BLAT data and turn it into BED 3 data. Go to the Table Browser (top blue menu bar under "Tools") and then set the group to "Custom Tracks" to find your blat results and change output format to "selected fields from primary and related tables" and then click "get output". On this page only select the boxes next to chrom, chromStart, and chromEnd and then "get output". It should look something like this:

#chrom chromStart chromEnd
chr1 7020247 7020498

Copy this data, you can even export it to a file, and return to the Genome Browser (top blue menu bar "Genome Browser") and then click the "View" menu and select "Multi-Region" where you will paste the data in the "Enter Custom regions as BED..." (or URL to file) option and select the adjacent radio button. Also select the "Highlight alternating regions in multi-region view" option and then click the "submit" button.

Now you have sliced the genome up to all the regions where your BLAT results have hit, and if you zoom out you will see all the tracks displaying for each of these regions with the tracks you selected from the start. Please note that the genome is now truncated to just the regions you have put into the multi-region, so you can't search for items beyond this virtual chromosome you have created to view. Clicking the "virt:####-#####" position box will remind you of the regions you have entered. Read more about multi-region here: http://genome.ucsc.edu/goldenPath/help/multiRegionHelp.html

Now with those saved BED3 items you can also use them as defined regions in another tool called the Data Integrator. Under the top blue bar "Tools" menu you can select "Data Integrator" and you can change the "region to annotate" to "defined regions" and then paste in those BED3 coordinates. You can then add the tracks you want to extract data for that overlap on these coordinates. You'll want to check the User Guide (http://genome.ucsc.edu/goldenPath/help/hgIntegratorHelp.html) to get a sense of how the first track selected will progressively minimize the secondary and third track added (see last graphic), if you wish extract multiple tracks simultaneously. You could also just extract data one track at a time (much like the "define regions" button will work on the Table Browser tool).

Lastly perhaps you have single nucleotide regions you are interested in exploring. We have the VAI tool (Tools menu) that will allow you to upload a VCF or pgSNP format track or use HGVS terms or rs# identifiers. For example, if you knew your example chr13 67485902 67485903 item (now BED 3 format for a single location, 0-based start) represented the genome changing from a G to a A nucleotide, you could convert those coordinates to artificial pgSnp formats like the following below (adding it is a custom track under "My Data" menu, "Custom Tracks" and then select "add custom tracks"):

track type=pgSnp name=SNP description="chr13:67,485,902G>A" 
chr13 67485902 67485903 G/A 2 0,0 0,0

Then when you went the VAI tool, you would be able to select that track as an input for "variants:" and click "Get results" which would show something like:

chr13_67485903_G/A chr13:67485903 A - - - intergenic_variant - - - - - - -

To learn more about our tools we have a training page with video links http://genome.ucsc.edu/training/index.html and we have an archived mailing list you can search https://groups.google.com/a/soe.ucsc.edu/forum/#!forum/genome where you can often get more in-depth information we can't always fit into every email response.

Here is a mailing list answer that introduces the BED format in the version of BED 3, BED 4 with a name, and BED 9 to enable individual item color encoding.
https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/xU_otQ4yClc/8TAb0zvxBQAJ

Here is a mailing list answer that introduces making a type=pgSnp format tracks: https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/hhBxD1KrAEU/etlMwJKiAQAJ

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genomics Institute


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAKcPFxgpQrxP7ACoPTVdgeov0ObG2MphxEbDv67ZuYOrQwiwng%40mail.gmail.com.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.

Guinevere Q. Lee

unread,
Oct 11, 2017, 10:39:08 AM10/11/17
to Brian Lee, gen...@soe.ucsc.edu
Dear Brian,

Thank you very much for your detailed response!  Much appreciate your effort in helping a newbie here!

My research question is:  My retrovirus has inserted into "Chr13: 67485903" - what is the identity of the gene (exon/intron) that the virus disrupted?  This is a CpG island?  

Following your instructions, I entered "Chr13: 67485903" in the table browser:
clade: Mammal
genome: Human
assembly: Dec 2013 (GRCh38/hg38)
Group:  Genes and Genes predictions
Track:  GENCODE v24      (<<< not sure what this is)

And I got this:
#name	chrom	strand	txStart	txEnd	cdsStart	cdsEnd	exonCount	exonStarts	exonEnds	proteinID	alignID
# No results in given region.

Does it mean there is nothing know about this genomic region (Chr13: 67485903)?  
Many thanks again!

Regards,
Guin



Christopher Lee

unread,
Oct 12, 2017, 2:08:27 PM10/12/17
to Guinevere Q. Lee, Brian Lee, gen...@soe.ucsc.edu
Hi Guin,

Please start by reviewing the videos and other materials posted on our
training page:
http://genome.ucsc.edu/training/

Specifically the following video, which illustrates how to find a list
of genes in a given region.
https://www.youtube.com/watch?v=RjB_N1GpT0U

And the OpenHelix videos:
http://www.openhelix.com/ucsc

If you are confused about what a track name like "GENCODE V24" means,
click the "describe table schema" button on the Table Browser where
you will be directed to a page like the following:
http://genome.ucsc.edu/cgi-bin/hgTables?db=hg38&hgta_group=genes&hgta_track=knownGene&hgta_table=knownGene&hgta_doSchema=describe+table+schema

This page displays the fields of the table you have selected, a short
description of each field, and some sample rows. If the selected table
happens to be the primary table for a track on the Genome Browser,
then you will also see a description of the track data and how it was
generated, along with some Methods and perhaps a References section.
You should read these pages (called track description pages) to get a
better idea of the data underlying a given track and to decide whether
or not to use the track data in your analysis.

The results you received from your Table Browser query only indicates
that there is no data in the current version of the GENCODE V24 track
at that location, but it does not necessarily mean that there is
nothing known about that particular position. You can try different
gene tracks, or different tracks all together to see if there is any
data overlapping your position. You can also enter your position into
the Genome Browser:
http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=chmalee&hgS_otherUserSessionName=hg38_chr13_position

and then turn different tracks on to see if there is any data
overlapping your position, and then use the Table Browser to extract
the data.

I hope this is helpful, please let us know if you have any further questions!

Thank you again for your inquiry and using the UCSC Genome Browser. If
you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a
publicly-accessible forum. If your question includes sensitive data,
you may send it instead to genom...@soe.ucsc.edu.

Christopher Lee
UCSC Genomics Institute

On Tue, Oct 10, 2017 at 8:02 PM, Guinevere Q. Lee
> https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAKcPFxhDpVY1AZ1Vj3SWOANrFFvqhens1HUhs805DhJWRAzMOA%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages