exon and intron coordinates

720 views
Skip to first unread message

Olivia

unread,
Dec 17, 2013, 11:52:04 AM12/17/13
to gen...@soe.ucsc.edu
Hi dear genome browser staff,

I have a question regarding finding the exon and intron coodinates from the golden path download folder. I downloaded the alignment file according to gene's location (chr1:xxx-xxx). But now I'd like to know if each of this position falls into the exotic region or the intronic region. Is there a file that UCSC keeps track of this? Thanks a lot for your help!

Best,
Olivia

Steve Heitner

unread,
Dec 17, 2013, 2:54:23 PM12/17/13
to Olivia, gen...@soe.ucsc.edu
Hello, Olivia.

Could you please supply some additional information including the gene
you're referring to, the directory you downloaded the file from and which
file you downloaded?

Please contact us again at gen...@soe.ucsc.edu if you have any further
questions. Questions sent to that address will be archived in a
publicly-accessible forum for the benefit of other users. If your question
contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

---
Steve Heitner
UCSC Genome Bioinformatics Group
--


Olivia

unread,
Dec 17, 2013, 3:03:59 PM12/17/13
to gen...@soe.ucsc.edu, st...@soe.ucsc.edu
Hi all,

Those genes will be from Drosophila melangaster and human. Unfortunately I don't have the list right now. But it will be a list consisting 1000 genes that my supervisor is interested in. As an example, I downloaded the multiz15way and multiz46way as the alignment sequence, but I need to know in the reference gene, at each of the position, whether they belong to an exotic region or intronic region or UTRs, therefore, I would like to find the file containing the coordinates of exon and intron locations for each gene for me to batch process them. Let me know if this is not clear. Thanks for the fast response.

Best,
Olivia

Steve Heitner

unread,
Dec 17, 2013, 3:37:02 PM12/17/13
to Olivia, gen...@soe.ucsc.edu
Hello, Olivia.

This information definitely helps. We do have download files for our gene
tracks that contain the coordinates of ALL genes. I think the best thing
for you to do, however, would be to enter your list of genes into our Table
Browser and just pull up the information for the genes you're specifically
interested in. If you're unfamiliar with the Table Browser, I recommend
viewing the User's Guide at
http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html.

We have several gene tracks. I'm not certain which format your gene names
are in, but for this example, I will assume they are in RefSeq format.
Follow the below steps for a list of human genes:

1. Navigate to http://genome.ucsc.edu/cgi-bin/hgTables

2. Select the following options:
Clade: Mammal
Genome: Human
Assembly: Feb. 2009 (GRCh37/hg19)
Group: Genes and Gene Prediction Tracks
Track: RefSeq Genes
Table: refGene
Region: genome

3. On the "identifiers" line, click the "paste list" button

4. Note at the top of this new screen, it shows you examples of what your
gene IDs are expected to look like. If your IDs don't look like this, you
will end up getting an error. Paste your list of gene IDs into the text box
and click the "submit" button.

5. On the "output format" line, select "all fields from selected table" to
list all fields in your output. If you would like to pick and choose which
fields you would like in your output, select "selected fields from primary
and related tables" instead.

6. Click the "get output" button. If you chose "selected fields from
primary and related tables" above, this is where you can choose your output
fields.

There are a number of gene tracks to choose from with Drosophila
melanogaster as well. You can experiment to see which ones match your gene
ID format.

Note also that if you select "BED - browser extensible data" as your output
format, you can further break your output down by exon, intron, 5' UTR, 3'
UTR, and coding regions. You can experiment to see which output formats and
options work best for your needs.

Olivia

unread,
Dec 17, 2013, 4:02:44 PM12/17/13
to st...@soe.ucsc.edu, gen...@soe.ucsc.edu
Hey Steve,
Thanks for the detailed answer. I appreciate it. Yes, I've been using Tables to retrieve the sequences. I was hoping this is a way I could do on the command line instead of clicking on website. Since the file i downloaded was the alignment from the genomic data (multiz15way) and I need to go through every column of the sequences to do some analysis, meanwhile, i hope it could tell me if the current location is in exon or introns or UTR, since from the multiz alignment file, the case of the letter doesn't imply the regions…So like you said below, the bed files contains the exon location information right? or else table browser won't be able to parse it into different regions. If so, then I might download the bed files and try to parse out the location myself.Please correct me if I'm wrong. Thanks a lot!

Best,
Olivia

Steve Heitner

unread,
Dec 17, 2013, 5:01:14 PM12/17/13
to Olivia, gen...@soe.ucsc.edu
Hello, Olivia.

When you get BED file output from the Table Browser, it lists specifically
the coordinates of the exon regions for each gene ID (or introns or whatever
you specify in your query). You can run a Table Browser query without
specifying an output filename to see the output on the screen without saving
it to a file. I recommend running a couple of queries with various options
to see if they'll work for what you're hoping to do.

I'm not sure exactly how you're planning on analyzing your data, but if
you're looking for command line options, another option you might consider
is querying our public MySQL databases. Instructions for connecting to our
MySQL server can be found at
http://genome.ucsc.edu/goldenPath/help/mysql.html. For example, if you're
looking at a specific hg19 gene, say NM_052956, and you would like to
determine the exon start/stop coordinates, you could run the following
query:

select * from refGene where name="NM_052956" \G

With a little work, this could also be done in batch fashion. This
essentially pulls data from the same source that the Table Browser pulls
from, though the Table Browser presents you with a few additional options.
I'm not sure if this will work for your needs, but I thought it was worth
mentioning.

Olivia

unread,
Dec 17, 2013, 5:04:48 PM12/17/13
to st...@soe.ucsc.edu, gen...@soe.ucsc.edu
Hi Steve,

I think the mysql command is what I need, from there, I could access the location of the start/stop of exons or introns. Thank you very much Steve! Much appreciated!

Best,
Olivia

Hiram Clawson

unread,
Dec 17, 2013, 5:07:00 PM12/17/13
to Olivia, gen...@soe.ucsc.edu
Good Afternoon Olivia:

Exon and Intron bed files can be extracted directly from
the table browser for your off-line processing. Note suggested
procedures at the bottom of this wiki page:
http://genomewiki.ucsc.edu/index.php/Gene_Set_Summary_Statistics

--Hiram

Yang Zhang, Miss

unread,
Dec 18, 2013, 4:30:48 PM12/18/13
to gen...@soe.ucsc.edu
Hi guys,

Just to follow up on the last question I asked, I am wondering if UCSC has a way to retrieve the maf alignment file in batch, but in separate files based on different regions I input. Let's say, if in table browser, I selected comparative genomics and multiz15way. And I give the user defined regions like this:

chr2L 20678917 20681844
chr2L 8403574 8408853
chr3R 4389847 4394970
chrX 21884247 21889294
chr2R 9326815 9337990
chr2R 1856599 1862122
chr2L 5279050 5283018

And if I select MAF file , what it gives me is one MAF file. How can I get separate MAF files, one for each row (which is a gene) all at the same time. Because I need to convert each of this maf file to a fasta file. It would be really tedious for me to paste one location in Table browser and convert to fasta in GALAXY every time. So I am hoping there is way I could do this automatically. If you any executables from UCSC which can complete this task, please let me know. Thank you so much for the help!

Best,
Olivia

Pauline Fujita

unread,
Dec 19, 2013, 4:49:06 PM12/19/13
to Yang Zhang, Miss, gen...@soe.ucsc.edu
Hello Olivia,

We have a number of command line tools for working with MAF files:

mafAddIRows
mafFilter
mafMeFirst
mafSpeciesSubset
mafToPsl
mafAddQRows
mafFrag
mafOrder
mafSplit
mafsInRegion
mafCoverage
mafFrags
mafRanges
mafSplitPos
mafFetch
mafGene
mafSpeciesList
mafToAxt

to obtain these utilities, you can grab the precompiled version here -
if your OS matches one of the ones for which we offer precompiled
source:

http://hgdownload.cse.ucsc.edu/admin/exe/

or you will find info about grabbing all of our utilities as a batch - here:

http://genome-source.cse.ucsc.edu/gitweb/?p=kent.git;a=blob;f=src/userApps/README

You can see the usage message for any utility by running it without
any arguments. It sounds like the program "mafFrag" might be what
you're looking for.

Hopefully this was helpful. If you have any further questions, please
reply to gen...@soe.ucsc.edu. All messages sent to that address are
archived on a publicly-accessible Google Groups forum. If your
question includes sensitive data, you may send it instead to
genom...@soe.ucsc.edu.

Best regards,

Pauline Fujita
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu
> --
>
Reply all
Reply to author
Forward
0 new messages