3'UTR coordinates

912 views
Skip to first unread message

Dannys Martínez Herrera

unread,
May 18, 2016, 11:33:25 AM5/18/16
to gen...@soe.ucsc.edu


 Hi all. I´m tryibng to retrieve the 3'UTR coordinates for all human transcripts (hg19). I followed these steps:

Select specie and assembly; Group: Genes and Gene predictions; Track: GENCODE Genes V19; Output Format: BED; Output file: myutr.bed

-> Get Output

-> Create one BED record per: 3' UTR Exons

-> Get BED

 but instead, the output BED file contains one record per exon even if they are annotated as 3'UTRs, as in this example:

chr1    11868   12227   ENST00000456328.2_utr3_0_0_chr1_11869_f 0       +

chr1    12612   12721   ENST00000456328.2_utr3_1_0_chr1_12613_f 0       +

chr1    13220   14409   ENST00000456328.2_utr3_2_0_chr1_13221_f 0       +,


Is there a way to get the Ensembl Transcript ID, 3'UTR start and 3'UTR from the Table Browser? Thank you in advance,


 Danny



Christopher Lee

unread,
May 18, 2016, 12:56:21 PM5/18/16
to Dannys Martínez Herrera, UCSC Genome Browser Discussion List

Hi Danny,

Thank you for your question about obtaining the Ensembl Transcript ID, 3'UTR start
and 3'UTR end from the Table Browser. Your Table Browser query includes non-coding
genes in the output, and since non-coding genes are by default untranslated, all exons
of non-coding genes will be returned by your query, in addition to normal 3' UTRs.
If you instead filter your results to exclude all gene types except coding, you will
retrieve the 3' UTR positions of protein coding genes.

Here is an example session that illustrates a Table Browser query returning all 3' UTR
exons vs coding-only 3' UTR exons:
http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=chmalee&hgS_otherUserSessionName=hg19CodingVsNonCodingUTRs

Notice how the top track, which contains all 3' UTR exons, has an item corresponding to
not only the 3' UTR exons of the SOD1 gene, but also for every exon of the green
non-coding genes. In contrast, the bottom track contains a filtered Table Browser
query to include only coding genes, which corresponds to exactly the 3' UTR exons
of the SOD1 gene.

To obtain the 3' UTR positions of only coding genes, follow the below steps:

1. Navigate to the Table Browser (http://genome.ucsc.edu/cgi-bin/hgTables)
2. Select the hg19 assembly and the Genes and Gene Predictions group
3. Select the GENCODE Genes V19 track and choose the Basic table (should be the default)
4. If you have a particular region you are interested in, select that region using
the define regions box, otherwise choose "genome"
5. Click the button "create" next to "filter"
6. Allow filtering from the linked table wgEncodeGencodeAttrsV19
7. In the transcriptClass field under the "hg19.wgEncodeGencodeAttrsV19 based filters" section,
enter "coding" into the text box, so "transcriptClass does match coding", then click "submit"
8. Under output format choose "BED - browser extensible data", enter a name for your file, and
click "get output"
9. On the "Output wgEncodeGencodeBasicV19 as BED" page, choose "3' UTR Exons", and click "get BED"
to download your file

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any
further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address
are archived on a publicly-accessible forum. If your question includes sensitive data, you
may send it instead to genom...@soe.ucsc.edu.


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.

Dannys Martínez Herrera

unread,
Jun 1, 2016, 11:26:12 AM6/1/16
to gen...@soe.ucsc.edu
 Hi everyone! Thanks in advance for the help. I need to download the canonical transcript sequence for a set of genes of interest, corresponding to Ensembl75. I know that the canonical transcripts can not be downloaded using BioMart, but with the Perl API. How can I specify the Ensembl version in the code? Thanks!!

 Danny

Emily Perry

unread,
Jun 1, 2016, 11:44:46 AM6/1/16
to gen...@soe.ucsc.edu, dmhe...@unav.es

Hi Danny

Hope UCSC don't mind me answering this one from Ensembl. Might have been better directed to us at help...@ensembl.org. UCSC may also have a Table Browser answer for you.

The APIs are matched to the databases, so to get the e75 database you'll need the e75 API. You can clone these from our Github:

https://github.com/Ensembl/

The API modules are found in the sections ensembl, ensembl-variation, ensembl-compara and ensembl-funcgen. When you go into them you'll see the Branch on the left above the file list – choose release/75 from the list then download.

All the best

Emily

Ensembl Outreach

-- 
Dr Emily Perry (Pritchard)
Ensembl Outreach Project Leader

European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge
CB10 1SD
UK 

Matthew Speir

unread,
Jun 1, 2016, 2:08:14 PM6/1/16
to Emily Perry, gen...@soe.ucsc.edu, dmhe...@unav.es
Hello Danny,

As Emily stated, questions about Ensembl's API and BioMart tool should be directed to their mailing list at help...@ensembl.org.

Additionally, we do not keep information regarding canonical transcripts for the Ensembl tracks in the UCSC Genome Browser database.

Thank you for your question and for using the UCSC Genome Browser. I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group
--


Reply all
Reply to author
Forward
0 new messages