analyzing trinotate annotation of differentially expressed transcripts specific to sample groups

sumitra sivaprakasam

unread,

Jun 27, 2023, 9:32:14 PM6/27/23

to trinityrnaseq-users

Hi Brian,

I performed differential analysis for my transcripts and also ran trinotate to annotate all the transcripts. After running trinotate and summarizing them using trinotate_report_summary.pl, i was able to view all the top hits for taxonomy, top species, gene ontology, etc. but this i believe is the representation of the overall transcripts.

Now, im trying to extract out the annotations for only the deferentially expressed transcripts. And to do that, i merged the .DE.subset file with the trinotate output using a python script and then i combined that output file (which only has DE transcripts and its annotations) with transcript.count_matrix data so i can manually plot and visualize the difference between the 2 groups (High vs Low). My goal is to see which taxonomy and functional activities are associated with the group 'High' and group 'Low'.

Is there any in-built script within trinity or trinotate that will allow me to do such analysis that are specific DE associated to different sample group?

Thank you so much.

Brian Haas

unread,

Jun 28, 2023, 8:01:20 AM6/28/23

to sumitra sivaprakasam, trinityrnaseq-users

Currently, we only have the GO enrichment set up for examining the
differentially expressed genes or transcripts.

> --
> You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/trinityrnaseq-users/35f9d132-2ba3-4583-8b93-eb5df04b00e1n%40googlegroups.com.

--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

sumitra sivaprakasam

unread,

Jul 4, 2023, 9:50:44 PM7/4/23

to trinityrnaseq-users

Thank Brian, tried out the GO enrichment at gene-level and it worked. After doing differential analysis, i generated heatmap with analyze_diff_expr.pl with --max_DE_genes_per_comparison 30. Is there a way to adjust the size of the axis label and i have 32 samples in total but only 16 is being displayed? How can i adjust the font size and include all the 32 samples in?

Brian Haas

unread,

Jul 5, 2023, 7:35:40 AM7/5/23

to sumitra sivaprakasam, trinityrnaseq-users

Hi,

When you run the analyze_diff_expr.pl script, it should make an R
script that's used for making the heatmap. You can edit and rerun that
Rscript after adjusting for the axis text sizes or setting the page
dimensions for the pdf. Let me know if it gives you trouble. You can
send me the Rscript and the input it uses, and I can edit it if
needed.

On Tue, Jul 4, 2023 at 9:50 PM sumitra sivaprakasam
<sumitrasiva...@gmail.com> wrote:
>
> Thank Brian, tried out the GO enrichment at gene-level and it worked. After doing differential analysis, i generated heatmap with analyze_diff_expr.pl with --max_DE_genes_per_comparison 30. Is there a way to adjust the size of the axis label and i have 32 samples in total but only 16 is being displayed? How can i adjust the font size and include all the 32 samples in?
>
>
>
>
>

> On Wednesday, 28 June 2023 at 20:01:20 UTC+8 Brian Haas wrote:
>>
>> Currently, we only have the GO enrichment set up for examining the
>> differentially expressed genes or transcripts.
>>
>> On Tue, Jun 27, 2023 at 9:32 PM sumitra sivaprakasam
>> <sumitrasiva...@gmail.com> wrote:
>> >
>> > Hi Brian,
>> >
>> > I performed differential analysis for my transcripts and also ran trinotate to annotate all the transcripts. After running trinotate and summarizing them using trinotate_report_summary.pl, i was able to view all the top hits for taxonomy, top species, gene ontology, etc. but this i believe is the representation of the overall transcripts.
>> >
>> > Now, im trying to extract out the annotations for only the deferentially expressed transcripts. And to do that, i merged the .DE.subset file with the trinotate output using a python script and then i combined that output file (which only has DE transcripts and its annotations) with transcript.count_matrix data so i can manually plot and visualize the difference between the 2 groups (High vs Low). My goal is to see which taxonomy and functional activities are associated with the group 'High' and group 'Low'.
>> >
>> > Is there any in-built script within trinity or trinotate that will allow me to do such analysis that are specific DE associated to different sample group?
>> >
>> > Thank you so much.
>> >
>> > --
>> > You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
>> > To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
>> > To view this discussion on the web visit https://groups.google.com/d/msgid/trinityrnaseq-users/35f9d132-2ba3-4583-8b93-eb5df04b00e1n%40googlegroups.com.
>>
>>
>>
>> --
>> --
>> Brian J. Haas
>> The Broad Institute
>> http://broadinstitute.org/~bhaas
>
> --
> You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/trinityrnaseq-users/bfb01d44-9090-4e37-b2ca-0cbde99107ebn%40googlegroups.com.

sumitra sivaprakasam

unread,

Jul 12, 2023, 9:30:36 AM7/12/23

to trinityrnaseq-users

Ahh got it! thanks, Brian. Also, what is the difference between expression.matrix and count.matrix? I believe the expression.matrix that I get after the differential analysis is a normalized one. I have 2 groups of interest with uneven sample sizes and I'm trying to further analyze my data to see for instance how many/ percentage of transcripts of a sample group are associated with a particular bacteria/function, dominating phylum, etc.

So i was wondering if i should use expression.matrix or count.matrix for such analysis.

Brian Haas

unread,

Jul 12, 2023, 9:31:55 AM7/12/23

to trinityrnaseq-users

The count matrix is mostly used for the differential expression analysis, as DESeq2 and related tools require the actual count data to fit to the statistical models.

For most everything else, we use the normalized expression matrix.

best,

~b

Reply all

Reply to author

Forward