error in amino acid alignment

16 views
Skip to first unread message

Cheung,Hannah C

unread,
Sep 2, 2014, 12:10:06 PM9/2/14
to gen...@soe.ucsc.edu
To Whom It May Concern,

There is a misalignment for the amino acid track for the gene XPO1. Please see the enclosed powerpoint.

Thanks!

Hannah

— 

Hannah Cheung, PhD, PMP

Research Scientist

Futreal Lab, Genomic Medicine

The University of Texas MD Anderson Cancer Center

1881 East Road, Unit 1954

3SCR5.4101

Houston, Texas 77054

Office: (713) 745-4678


XPO1-discrepancy.pptx

Jonathan Casper

unread,
Sep 2, 2014, 12:42:03 PM9/2/14
to Cheung,Hannah C, gen...@soe.ucsc.edu

Hello Hannah,

Thank you for your report of a problem with the amino acid listing for the XPO1 gene. The issue that you describe is happening because our alignments place the XPO1 gene on the - strand. This is more easily seen if you zoom out a bit from the view in your screenshot - in the intron regions, we display little chevrons "<<<" to indicate the direction of alignment. The 5' to 3' reference sequence for the codon you name is GAA, but on the - strand that becomes TTC which codes for phenylalanine.

You can also see this by expanding the "Mapping and Sequencing" track set, and then changing the display of the "Base Position" track to full. This makes the browser display amino acid translations for each of the three forward reading frames. Clicking the "-->" arrow in the top left corner beneath "Scale" will reverse the coding direction, allowing you to see translations for the three reverse reading frames.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

--
Jonathan Casper
UCSC Genome Bioinformatics Group



--


Cheung,Hannah C

unread,
Sep 2, 2014, 3:41:38 PM9/2/14
to gen...@soe.ucsc.edu
Hi Jonathan,

I’m not sure that I made myself clear. The TTC should be a glutamic acid and not phenylalanine.

Thanks,
Hannah

Jonathan Casper

unread,
Sep 4, 2014, 2:41:20 PM9/4/14
to Cheung,Hannah C, gen...@soe.ucsc.edu

Hello Hannah,

I might be confused by your question, but I will attempt to describe what I'm seeing here. The second slide of your powerpoint file displays an alignment of the sequence for the protein XPO1 to the hg19 genome assembly. The aligned sequence from the genome assembly is from the - strand. In this alignment, the protein contains the amino acid sequence LFEFM (bracketing the E that you describe). The corresponding nucleotide sequence from that portion of the alignment is CTG.TTC.GAA.TTC.ATG (periods marking the codon boundaries). In that alignment, TTC corresponds to F for phenylalanine, and GAA corresponds to E for glutamic acid.

In the screenshot of UCSC Genome Browser from the third page of your powerpoint file, the + strand of the genome assembly sequence is displayed. The XPO1 gene was aligned to the - strand, which means that it is displayed in reverse order. Going left-to-right in that browser screenshot, the amino acid sequence for the same bracketed position around the E is MFEFL - the reverse of the LFEFM from the alignment you showed. Similarly, the nucleotides displayed along the top of the page are for the + strand. The sequence displayed for that bracketed region is CAT.GAA.TTC.GAA.CAG. This is the reverse complement of the CTG.TTC.GAA.TTC.ATG sequence from your alignment, again because your alignment was to the - strand and the displayed sequence is coming from the + strand. If you click the '--->' arrow in the top left corner of the UCSC Genome Browser (below "Scale"), the browser will display the complemented nucleotide sequence. Please note that the sequence should then be properly read from right to left (matching the new direction of the arrow: "<---"). Read from right-to-left, the browser will map TTC to F and GAA to E, just as your alignment does.

Despite the fact that the UCSC Genome Browser displays sequence from the + strand by default, our codon display for genes displays the amino acids from the protein itself - even if that protein is located on the - strand. This may seem to cause a discrepancy where in one part of the browser, GAA will be displayed as E (for a + strand protein) and elsewhere as F (for a - strand protein). It seemed like a better choice to display the correct amino acids for - strand proteins and accept that discrepancy, than to display the amino acids that would be translated if - strand proteins were somehow read in the wrong direction.

Does this answer your question? If not, can you please provide more detail about why you think that there is a misalignment and what changes are required to correct it?

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

--
Jonathan Casper
UCSC Genome Bioinformatics Group

--


Reply all
Reply to author
Forward
0 new messages