Question about UTR from UCSC Table Browser.

367 views
Skip to first unread message

胡新蕾

unread,
Jun 26, 2018, 11:06:38 AM6/26/18
to genome
Hello,

I use UCSC table browser to retrieve 5' UTR and 3' UTR for hg19. The track I choose is GENCODE Genes V19 and the table I choose is Basic (wgEncodeGencodeBasicV19). Defined region is main chromosome (e.g. chr1, chr2, ... ,chrX, chrY, chrM). Output format is BED. I also download Gencode V19 gtf file (comprehensive gene annotation CHR region ) from https://www.gencodegenes.org/releases/19.html to verify the bed I get.

There is a question I don't understand:

It shows that one transcript can have more than one 5'UTR(or 3'UTR).
According to this: https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/TNsxWD5Gpv8/PX4W1yinvcMJ, I know that "All exons of non-coding genes on the + strand are listed as 3’ UTRs and all exons of non-coding genes on the - strand are listed as 5’ UTRs." . This can explain multiple UTRs for non-coding genes. It seems that this could also happen to pseudogene. And for protein-coding gene, I wonder if the same situation could happen to non-coding transcripts, e.g. ENST00000487214. Transcript type of this transcript is "processed transcript" in the gtf file I download. In the 5'UTR bed file generated from table browser, it show there are 7 5'UTR for it:

chr1    889805  889903  ENST00000487214.1_utr5_0_0_chr1_889806_r        0       -
chr1    891302  891393  ENST00000487214.1_utr5_1_0_chr1_891303_r        0       -
chr1    891474  891595  ENST00000487214.1_utr5_2_0_chr1_891475_r        0       -
chr1    892273  892405  ENST00000487214.1_utr5_3_0_chr1_892274_r        0       -
chr1    892478  892653  ENST00000487214.1_utr5_4_0_chr1_892479_r        0       -
chr1    894308  894461  ENST00000487214.1_utr5_5_0_chr1_894309_r        0       -
chr1    894594  894689  ENST00000487214.1_utr5_6_0_chr1_894595_r        0       -

So what should I do to get correct UTR bed file ? The same as the steps showed in the question link ?
Also is it the same reason that transcripts in UTR bed file from table browser don't appear to be UTR in gencode gtf file ?

Maybe I miss something. Thank you very much for helping me!




Matthew Speir

unread,
Jun 28, 2018, 7:23:51 PM6/28/18
to 胡新蕾, genome
Hello!

Thank you for your question about obtaining UTR information from the UCSC Genome Browser.

What that email describes for non-coding RNAs will also apply to pseudogenes in the GENCODE v19 tables in our database as both are represented in the same way in the underlying tables. If you are looking to filter these types of genes out of your output, then the final email from Steve in the email thread you linked (https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/TNsxWD5Gpv8/PX4W1yinvcMJ) provides instructions for this:

1. From the main Table Browser screen, on the filter line, click the “create” button
2. In the “Free-form query” box, enter the following:  cdsStart != cdsEnd
3. Click the “submit” button
This will eliminate any non-coding genes from your results.

It would also be interesting to know what you are doing with the output as that may help us provide a better answer to your question. Are you just attempting to convert the GTF files to bed?

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group

Training videos & resources: http://genome.ucsc.edu/training/index.html
Want to share the Browser with colleagues?
Host a workshop: http://bit.ly/ucscTraining






--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/tencent_B77D5F5C7D20516F67EF6E45D878041DB70A%40qq.com.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.



--
Matthew Speir
Outreach, User Experience, Quality Assurance and User Support
HCA, CIRM, and UCSC Genome Browser
UCSC Genomics Institute
Reply all
Reply to author
Forward
0 new messages