Predicting 3'UTRs from StringTie output

116 views

Skip to first unread message

ylee

unread,

Apr 24, 2019, 12:42:53 PM4/24/19

to TransDecoder-users

Dear Brian and TransDecoder users,

I am currently trying to use TransDecoder to predict the 3'UTRs of novel transcripts assembled from StringTie and am a little confused as to which scripts to best use in the /utils directory. Apologies if this question has already been answered but I am having a little trouble finding my way and would be very grateful for any advice on my current analyses.

In short, I wanted to construct a more comprehensive transcriptome using RNA seq from samples treated with and without a stress stimuli. I followed the below tutorial for reference guided transcript assembly:

https://github.com/griffithlab/rnaseq_tutorial/wiki/Differential-Expression

I performed alignments using STAR, then StringTie to assemble transcripts (merged the gtfs for all samples) then re-ran Stringtie using the reference guided merged GTF. Using this GTF I then ran TransDecoder.LongOrfs and TransDecoder.Predict. I am now at the point where I have generated the genome-based coding region annotation file but am I right in assuming that this has multiple entries for a single transcript? From reading around I know that there are provided scripts to help select and report the longest ORF for each transcript but I can't quite figure out how to use this and how select_best_ORFs_per_transcript.pl is different to get_longest_ORF_per_transcript.pl

My main goal is essentially to obtain the fasta sequences of all 3'UTRs for these novel StringTie transcripts but I am unsure as to which to pull from the resulting outputs. If anyone might be able to guide me in the right direction it would be much appreciated.

Many thanks for the help,

Yan

Reply all

Reply to author

Forward

0 new messages