Extracting data from the table browser

151 views
Skip to first unread message

Samarakkody, Ann

unread,
Nov 16, 2016, 10:22:59 AM11/16/16
to gen...@soe.ucsc.edu

Hello,

I'm trying to extract data from the table browser and I was hoping to get help from you in that regards.


I want to extract data from the human data base for the following.

 1.  genomic intervals for all exons

2. Genomic intervals for all introns

3. Specific region (-500 to +500 from TSS) of promoter for all genes


I tried to address number 1 by using the exon start and end option on the table browser (screen shots attached below) however I'm unable to extract the interval from the output file. What would be the best way to extract the data for 1-3?

Any suggestions or directions would be great appreciated!

Thank you

Sincerely,


Ann Sanoji Samarakkody


PhD candidate - Nechaev Lab

Program of Anatomy and Cell Biology

Department of Biomedical Sciences

School of Medicine and Health Sciences

University of North Dakota









Jairo Navarro Gonzalez

unread,
Nov 18, 2016, 4:53:54 PM11/18/16
to Samarakkody, Ann, gen...@soe.ucsc.edu
Hello Ann, 

Thank you for using the UCSC Genome Browser and your question about querying the Table Browser. 
If you are unfamiliar with the Table Browser, please see the User's Guide at http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html.

The easiest way to obtain these coordinates is by creating three different BED files using the Table Browser. The steps to extract the genomic intervals for all exons and introns are both fairly similar and simple. To get the promoter coordinates for all genes with a 500 bp padding on each side is a little more complex and you will need to perform some basic scripting to process your output.

To obtain Genomic Intervals for Exons and Introns
Step 1: Configure Table Browser Settings

Clade: Mammal 
Genome: Human
Assembly: Feb. 2009 (GRCh37/hg19) 
Group: Genes and Gene Predictions 
Track: UCSC Genes 
Region: genome
Output format: BED - browser extensible data

Step 2: Click 'get output'

Select either 'Exons plus' or 'Introns plus'
Select 'get BED'

To obtain specific region of promoter for all genes
Step 1: Configure Table Browser Settings

Clade: Mammal 
Genome: Human
Assembly: Feb. 2009 (GRCh37/hg19) 
Group: Genes and Gene Predictions 
Track: UCSC Genes 
Region: genome
Output format: selected fields from primary and related tables
Output file: enter a name for your file
select 'get ouput'

Step 2: Choose output fields

From 'Select Fields from hg19.knownGene', select name, chrom, and txStart
click 'get output'

Step 3: Use a script to change the output file

You should have an output like:

#name      chrom  txStart
uc002ypa.3 chr21  33031934
Using a script, copy and add another column to each line of the txStart site
The result should look something like:
#name      chrom  txStart  txStartCopy
uc002ypa.3 chr21  33031934  33031934
Once you have your file in that format, you can subtract 500 bp from the txStart coordinate and 500 bp from the txStartCopy position. If you would like to view this file in the browser, you will also have to rearrange the columns into BED format.

As a reminder, our coordinate positions have a 0-based start and a 1-based end. You can read more about this in our documentation here: http://genome.ucsc.edu/FAQ/FAQtracks#tracks1

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Jairo Navarro
UCSC Genomics Institute

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.

Jairo Navarro Gonzalez

unread,
Nov 22, 2016, 4:42:28 PM11/22/16
to Samarakkody, Ann, gen...@soe.ucsc.edu
Hello Ann,

Unfortunately, my response to your question did not take into account promoter sequences that are on the negative strand and my instructions will give you the incorrect TSS for those promoters. To fix this, you will have to run the steps under "To obtain specific region of promoter for all genes" twice, once for the positive strand and again for the negative strand. I am amending the steps "To obtain specific region of promoter for all genes" to account for these sequences. Everything that I have added or changed will be in bold.

To obtain specific region of promoter for all genes
Step 1: Configure Table Browser Settings

Clade: Mammal 
Genome: Human
Assembly: Feb. 2009 (GRCh37/hg19) 
Group: Genes and Gene Predictions 
Track: UCSC Genes 
Region: genome
filter: changes depending on the strand you are retrieving
  • For the positive (+) strand, your filter will be: strand does match +
  • For the negative (-) strand, your filter will be: strand does match -
Output format: selected fields from primary and related tables
Output file: enter a name for your file
select 'get output'

Step 2: Choose output fields

From 'Select Fields from hg19.knownGene', select name, chrom
If you are filtering for the positive strand, choose txStart
If you are filtering for the negative strand, choose txEnd
click 'get output'

Step 3 is the same as before and the script you wrote should still work. The only difference is that you will have to run your script for the two files created.

Sorry for the confusion and any issues this may have caused. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Jairo Navarro
UCSC Genomics Institute
Reply all
Reply to author
Forward
0 new messages