Hello,
I use UCSC table browser to retrieve 5' UTR and 3' UTR for hg19. The track I choose is GENCODE Genes V19 and the table I choose is Basic (wgEncodeGencodeBasicV19). Defined region is main chromosome (e.g. chr1, chr2, ... ,chrX, chrY, chrM). Output format is BED. I also download Gencode V19 gtf file (comprehensive gene annotation CHR region ) from
https://www.gencodegenes.org/releases/19.html to verify the bed I get.
There is a question I don't understand:
It shows that one transcript can have more than one 5'UTR(or 3'UTR).
According to this:
https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/TNsxWD5Gpv8/PX4W1yinvcMJ, I know that "
All exons of non-coding genes on the + strand are listed as 3’ UTRs and all exons of non-coding genes on the - strand are listed as 5’ UTRs." . This can explain multiple UTRs for non-coding genes. It seems that this could also happen to pseudogene. And for protein-coding gene, I wonder if the same situation could happen to non-coding transcripts, e.g. ENST00000487214. Transcript type of this transcript is "processed transcript" in the gtf file I download. In the 5'UTR bed file generated from table browser, it show there are 7 5'UTR for it:
chr1 889805 889903 ENST00000487214.1_utr5_0_0_chr1_889806_r 0 -
chr1 891302 891393 ENST00000487214.1_utr5_1_0_chr1_891303_r 0 -
chr1 891474 891595 ENST00000487214.1_utr5_2_0_chr1_891475_r 0 -
chr1 892273 892405 ENST00000487214.1_utr5_3_0_chr1_892274_r 0 -
chr1 892478 892653 ENST00000487214.1_utr5_4_0_chr1_892479_r 0 -
chr1 894308 894461 ENST00000487214.1_utr5_5_0_chr1_894309_r 0 -
chr1 894594 894689 ENST00000487214.1_utr5_6_0_chr1_894595_r 0 -
So what should I do to get correct UTR bed file ? The same as the steps showed in the question link ?
Also is it the same reason that transcripts in UTR bed file from table browser don't appear to be UTR in gencode gtf file ?
Maybe I miss something. Thank you very much for helping me!