sequence +/- 1kb from txStart

103 views
Skip to first unread message

Jessilyn Dunn

unread,
Sep 15, 2014, 3:13:07 PM9/15/14
to gen...@soe.ucsc.edu
Hello,

I am trying to use the UCSC browser to download the sequence for all genes in the mm9 genome, but only in the region +/-1kb from the txStart.

While I understand that there is a way to retrieve promoter sequences (-1kb, etc.) using the Table Browser (http://genome.ucsc.edu/FAQ/FAQdownloads.html#download18) this appears to only be useful for the upstream sequence, not both the up and downstream sequences. I've also tried using the table browser "region" definition, the and the user-defined regions (which are limited to 1,000, but there are ~34,000 genes I would need).

Any insight you can provide would be greatly appreciated!
Thank you very much!
Sincerely,
Jessilyn

Steve Heitner

unread,
Sep 15, 2014, 5:30:38 PM9/15/14
to Jessilyn Dunn, gen...@soe.ucsc.edu

Hello, Jessilyn.

It is possible to do this by creating a custom track that contains your chromosome, transcription start site and gene name.  You will need to perform some basic scripting to process your output.  The general strategy will be as follows:

1. Use the Table Browser to create an output file with output like: chr1 134212701 Nuak2
2. Create a script that adds the start coordinate a second time: chr1 134212701 134212701 Nuak2
3. Load this edited file as a custom track in the Table Browser
4. Obtain sequence and specify 1,000 extra bases both upstream and downstream

Using the Table Browser, perform the following steps:

1. Navigate to http://genome.ucsc.edu/cgi-bin/hgTables

2. Select the following options:
Clade: Mammal
Genome: Mouse
Assembly: July 2007 (NCBI37/mm9)
Group: Genes and Gene Predictions
Track: RefSeq Genes
Table: refGene
Region: genome
Output format: selected fields from primary and related tables
Output file: enter a name for your file

3. Click the “get output” button

4. In the “Select Fields from mm9.refGene” section, check the “chrom”, “txStart” and “name2” checkboxes

5. Click the “get output” button

6. At this point, you will need to write a script to insert an additional txStart column into your output file.  Note that the contents of the refGene table are not sorted, so if you want your results to be ordered, you will also need to sort your file.

7. In the Table Browser, on the right side of the “group” line, click the “add custom tracks” button

8. Next to “Paste URLs or data”, click the “Browse” button to select your edited output file

9. Click the “Submit” button

10. Click the “go to table browser” button

11. Change “output format” to “sequence”

12. Click the “get output” button

13. Insert 1,000 into the upstream and downstream text boxes

14. Click the “get sequence” button


Please contact us again at gen...@soe.ucsc.edu if you have any further questions. 
All messages sent to that address are archived on a publicly-accessible Google Groups forum.  If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

---
Steve Heitner
UCSC Genome Bioinformatics Group

--

Reply all
Reply to author
Forward
0 new messages