Find out TSS positions

1,266 views
Skip to first unread message

joana apolonio

unread,
Aug 26, 2015, 12:04:36 PM8/26/15
to gen...@soe.ucsc.edu
Hi,

I have one doubt about how to find the TSS position in the UCSC genome browser.

I found a tutorial that explain that we can get the transcription start sites (TSS) for all genes in the human genome from the UCSC Table Browser. I followed all of the steps and I get an output with the information about the transcription start position.

I did this for the EGFR and TERT genes. For the first one I obtained the genomic coordinate chr7:55086724 and for TERT  I got the following coordinate  chr5:1253286. 
For EGFR gene I think that the position is correct, however for TERT I think that the position that I obtained through the table browser (mentioned above) is not correct because when we open the genome browser, the genomic coordinate chr5:1253286 is near the last exon (exon 12, 16 or 15, depending of the transcript variant) instead of the promoter. Furthermore, in a document with genomic coordinates of different probes and TSS positions for a specific gene, that I obtained from the TCGA, it is mentioned that the TSS positions for TERT is chr5:1295159, and when I open the genome browser it is possible to see that this position is present at the promoter sequence, which makes more sense.

So base on this can you clarify me about the TSS position for TERT, I would like to confirm which one is correct.

Thank you for your attention!


Best regards,

Joana Apolónio

Jonathan Casper

unread,
Aug 26, 2015, 6:12:27 PM8/26/15
to joana apolonio, gen...@soe.ucsc.edu

Hello Joana,

Thank you for your question about determining the correct TSS position for the TERT gene. Based on the coordinates you list, I think you are probably looking at the UCSC Genes track for the human hg19 genome assembly (table "knownGene" in the UCSC Table Browser). Please let me know if that is not the case, but my statements below should also apply to many other assemblies and tracks.

It looks like the difficulty you are having is due to the structure of the gene table. Coordinates are relative to the start of the + strand. The txStart and txEnd coordinates are arranged so that the txStart coordinate is always the smaller coordinate. For genes like EGFR that appear on the + strand, this means that the txStart coordinate marks the 5' end of the gene at the start of transcription (or the start of our alignment of the transcript, at least). For genes like TERT that appear on the - strand, however, the txStart coordinate marks the 3' end of the transcript. This means that for genes on the - strand, it is the txEnd coordinate that marks the transcription start site (or at least the position closest to the TSS that we were able to align successfully). The txEnd coordinate for most of our TERT transcripts in the hg19 UCSC Genes track is 1295162 on chromosome 5, which matches closely to your expected value of chr5:1295159.

You can find the strand of a gene in the UCSC Table Browser by obtaining the full gene record (output format "all fields from selected table") and then looking at the "strand" column to see if the record is on the + or - strand. This can be seen visually on the UCSC Genome Browser by going to the location of the gene and then looking at the displayed transcripts. In the default view where the 5' end of the + strand is on the left edge, transcripts that align to the + strand will have right-pointing chevrons on them (>>>>>>>>>). Transcripts that align to the - strand will have left-pointing chevrons (<<<<<<<). More information about this display convention is available in this mailing list answer from our archives: https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/2u7SPsphmQU/_q6DeXGuEgAJ.

You may also be interested to learn that the UCSC Table Browser provides options to download 5' and 3' UTR regions of your genes as well as upstream and downstream regions. These options all respect the strand of the gene record, and using them to ask for the 5' end of TERT will return a position range near chr5:1295159. You can find these options by selecting the output format "BED - browser extensible data" in the UCSC Table Browser.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

--
Jonathan Casper
UCSC Genome Bioinformatics Group


--


Reply all
Reply to author
Forward
0 new messages