[Genome] What is exon frame?

1,602 views
Skip to first unread message

mousheng xu

unread,
Nov 22, 2006, 3:07:05 PM11/22/06
to
Dear Help,

I am trying to map allele variants to hg18, but has difficulties
understanding part of UCSC data. In particular, I am not sure what
"exonFrames" is in your sequence format files.

For example, the gene "CHRNA1" on chr12 '+' strand has

exon starts:
175320852,175321553,175322919,175326476,175327192,175330539,175332304,175332461,175337325,
exon ends:
175321229,175321793,175323143,175326714,175327388,175330649,175332349,175332607,175337416,
exon frames: 0,0,1,0,2,0,0,1,0,

My understanding is that there are 3 alternative spliced transcripts, and
the first transcript is composed of exons correspond to exon frame '0', i.e.
the 1st, 2nd, 4th, 6th, 7th, and 9th exons.

But there are a few problems:

1. Sometimes the number of "exon start" does not equal to that of "exon
frames". e.g. gene "C1QA" (NM_015991) has two exons, but 3 exon frames
"-1,0,1,".

2. Many allele variants are reported as something like "codon 123" or
"IVS42" of gene "ABC" (refseq "NM_4321"), i..e in terms of mRNA. Is there a
simple way of mapping these relative positions to the absolute hg18 (or a
previous version) coordinates? I hope I do not have to do "Blast" or "Blat"
and compare the mRNA with the genomic DNA.

YOUR HELP WILL BE HIGHLY APPRECITED!!!

Sincerely,

Mousheng Xu

Research Fellow
BWH, Harvard Medical School

Ann Zweig

unread,
Nov 22, 2006, 6:35:38 PM11/22/06
to
Hello Mousheng Xu,

Let me explain to you what the "exonFrames" field is for, then I think the
rest of your questions will be answered as well. First of all, I'm not sure I
found the exact gene you were looking at. When I searched for your gene
(CHRNA1) on hg18, I found a Ref Seq gene on chr2 on the - strand (you noted that
it was on chr12 on the + strand). I'm pretty sure you meant to type chr2, so
I'm going to work under that assumption.

The first thing to note is that this gene is on the - strand, so of the
nine exons, the *last* one listed in the table is really what we think of as
exon 1. Here is the exon structure for this gene (refSeq ID = NM_000079).

#name chrom strand txStart txEnd cdsStart cdsEnd exonCount
exonStarts exonEnds id name2 cdsStartStat cdsEndStat exonFrames
NM_000079 chr2 - 175320569 175337427 175321097 175337368 9
175320569,175321553,175322919,175326476,175327192,175330539,175332304,175332461,175337325,
175321229,175321793,175323143,175326714,175327388,175330649,175332349,175332607,175337427,
0 CHRNA1 cmpl cmpl 0,0,1,0,2,0,0,1,0,

Because we are on the - strand, you will need to switch your display in the
browser to show the reverse complement when you are zoomed into the base level.
To do this click on the little arrow in the Base Position track ("--->").
After you click on it, it will switch to "<---" and the bases will be reverse
complemented.

When you view the gene in the browser, you will see the 9 exons (count
from the right to the left). So for example, exon1 starts at chr2:175337427 and
ends at chr2:175337325. If you zoom into the end of exon1 you will see that the
protein contains the amino acid "A", but that there is only one nucleotide in
this exon: G. Now, zoom into the start of exon2 (at chr2:175332607). Here you
will see the amino acid "A", and the other two nucleotides (that join with the
single one at the end of exon1): CT. So, the end of exon1 + the start of exon2
have GCT, which makes an "A" amino acid.

So, the exonFrames field tells you how the two exons join together. In
this case, there is 1 nucleotide at the end of exon1 that joins with the first
two nucleotides at the start of exon2. Because the gene is on the - strand, you
must also read the exonFrames field from right to left. So, the next-to-last
value in this field for this gene is '1'. This means that exon2 picks up one
nucleotide from the exon1 to make the amino acid.

To address question #1, an exonFrames value of -1 means that the exon is
entirely UTR.

Please let us know if this explanation does not clear things up for you.

Regards,

----------
Ann Zweig
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
Reply all
Reply to author
Forward
0 new messages