https://github.com/griffithlab/rnaseq_tutorial/wiki/Differential-Expression
I performed alignments using STAR, then StringTie to assemble transcripts (merged the gtfs for all samples) then re-ran Stringtie using the reference guided merged GTF. Using this GTF I then ran TransDecoder.LongOrfs and TransDecoder.Predict. I am now at the point where I have generated the genome-based coding region annotation file but am I right in assuming that this has multiple entries for a single transcript? From reading around I know that there are provided scripts to help select and report the longest ORF for each transcript but I can't quite figure out how to use this and how select_best_ORFs_per_transcript.pl is different to get_longest_ORF_per_transcript.plMy main goal is essentially to obtain the fasta sequences of all 3'UTRs for these novel StringTie transcripts but I am unsure as to which to pull from the resulting outputs. If anyone might be able to guide me in the right direction it would be much appreciated.
Many thanks for the help,
Yan