Heatmap for GO terms

1,734 views
Skip to first unread message

Giorgio Casaburi

unread,
Jul 6, 2015, 4:30:24 PM7/6/15
to trinityrn...@googlegroups.com
Hi all,

I have run the entire Trinotate pipeline followed by GO enrichment with GOseq. The pipeline generate a heatmap with 120 significant genes (reported as TranscriptID). For some reason the same extact  heatmap was generated even when doing the GO enrichment analysis and came out  together with the .enriched and .depleted files. (just to clarify the scirpt was "analyze_diff_expr.pl"  with and without GO parameters). I thought that I would have had another heatmap reporting the GO terms instead of just the TranscitpID one, which is not really informative. What's the way to go? Is there any way to have eighter the transcripts clustered by GO (So that I can add the GO terms say in illustrator) or having the tools generate the hetmap with the GO on it?
What do you guys do in this situation? 

Thanks a lot in advance!



Brian Haas

unread,
Jul 6, 2015, 10:38:32 PM7/6/15
to Giorgio Casaburi, trinityrn...@googlegroups.com
Hi Giorgio,

We don't have another heatmap or other visual representation for the gene ontology enrichment analysis.  If you want to experiment with something, you could try using our PtR script (which makes the heatmaps).

trinityrnaseq/Analysis/DifferentialExpression/PtR

There's usage info for it, and there's some basic description of it in our google groups thread (search for PtR) - and documentation is forthcoming.

best,

~b


--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

Giorgio Casaburi

unread,
Jul 7, 2015, 9:26:46 AM7/7/15
to trinityrn...@googlegroups.com, giorgio...@gmail.com

Thanks Bryan! The scritp needs a matrix (i.e. matrix.RAW.normalized.FPKM) to create the heatmap. Those matrices generated by following the trinoatte tutorial, only contain TranscitpID and note GENEID or GOID or any sort of annotation. What I am looking for is an actual scirpt that can generate that matrix but with the actual geneID or GO annotated (i.e. from swissprot) somehow clustered if possible. Like these examples here:
Is there anyway to arrive having somethign like these or is it impossible?






























 
Screen Shot 2015-07-07 at 9.21.57 AM.png

Brian Haas

unread,
Jul 7, 2015, 9:37:23 AM7/7/15
to Giorgio Casaburi, trinityrn...@googlegroups.com
Hi Giorgio,

Something like that would surely be possible if someone wants to work on coding it up.  It's not something that I'll personally have time to do in the near-term, though.

Any takers?

~brian

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

Giorgio Casaburi

unread,
Jul 7, 2015, 9:43:28 AM7/7/15
to Brian Haas, trinityrn...@googlegroups.com
Brian:
I think that would be extremely useful for the community. 

Anyway for now, what's the best strategy to cross-reference trinotate annotation and the matrix generated with EdgeR/DE pipeline? How do you deal with that? 
--
Giorgio Casaburi, Ph.D.
Postdoctoral Research Associate
Department of Microbiology and Cell Science
University of Florida
Space Life Sciences Lab
505 Odyssey Way - Exploration Park

Brian Haas

unread,
Jul 7, 2015, 9:45:25 AM7/7/15
to Giorgio Casaburi, trinityrn...@googlegroups.com
What I've been doing is modifying the transcript (or gene) accession to include the various Trinotate attributes.    I'll dig around for the code and instructions on how to do it shortly.

best,

~b

Giorgio Casaburi

unread,
Jul 7, 2015, 9:51:47 AM7/7/15
to Brian Haas, trinityrn...@googlegroups.com
Super Bryan, please sshare some code when you have a chance. You don't imagine how many people got stuck in this pariticular step and a couple of scripts would be of immense help.

THANKS,
~G

Brian Haas

unread,
Jul 7, 2015, 10:05:16 AM7/7/15
to Giorgio Casaburi, trinityrn...@googlegroups.com
So here's what I typically do:

1.  create a mapping between the original Trinity identifiers and a version that is loaded with Trinotate attributes:

TRINOTATE/util/Trinotate_get_feature_name_encoding_attributes.pl 

usage: ./Trinotate_get_feature_name_encoding_attributes.pl Trinotate.xls > trinotate.gene_id_mappings


2.  Update a matrix, replacing the original Trinity identifiers with those encoding the Trinotate attributes:


trinityrnaseq/Analysis/DifferentialExpression/rename_matrix_feature_identifiers.pl  

###############################################

#  Usage: ./rename_matrix_feature_identifiers.pl matrix.txt  new_feature_id_mapping.txt
#
#  The 'new_feature_id_mapping.txt' file has the format:
#
#   current_identifier <tab> new_identifier
#   ....
#
#
#   Only those entries with new names listed will be updated, the rest stay unchanged.
#
#

#################################################


And then use PtR to remake a heatmap:

trinityrnaseq/Analysis/DifferentialExpression/PtR --matrix updated_matrix.txt --log2 --heatmap --gene_dist euclidean --sample_dist euclidean --min_colSums 0 --min_rowSums 0

If you have certain groups of genes that you want to highlight in the heatmap - such as particular gene ontology categories, you can explore the --gene_factors parameter (see PtR usage info).


I hope this helps,


~brian



Giorgio Casaburi

unread,
Jul 7, 2015, 10:13:23 AM7/7/15
to trinityrn...@googlegroups.com, giorgio...@gmail.com
Brian:

This is wonderful! I can't thank you enough! I hope others will run into this approach on their way, it save tons of time.

Thanks a lot for all your kind help and assistance.

~Giorgio

Brian Haas

unread,
Jul 7, 2015, 10:15:33 AM7/7/15
to Giorgio Casaburi, trinityrn...@googlegroups.com
Sure thing.  Eventually this will make it into the regular Trinity documentation.

best,

~b

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

Giorgio Casaburi

unread,
Jul 7, 2015, 11:35:06 AM7/7/15
to trinityrn...@googlegroups.com, giorgio...@gmail.com
BRIAN FYI:

 
The pipeline you suggested did end up with a heatmap but unfortunatelly it just create #transcript_id (second column) from trinotate_report.xls (above) instead of #gene_id (first column) but the goal was having GO (11th column) or SwissProt annotation (3d column):

#gene_id    transcript_id    sprot_Top_BLASTX_hit    TrEMBL_Top_BLASTX_hit   RNAMMER prot_id prot_coords sprot_Top_BLASTP_hit TrEMBL_Top_BLASTP_hit Pfam SignalP TmHMM eggnog gene_ontology_blast gene_ontology_pfam transcript peptide

 
Does that make sense? I have even tried to switch the columns but the first script (Trinotate_get_feature_name_encoding_attributes.pl) gave me an error.

~Giorgio
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.

To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

Brian Haas

unread,
Jul 7, 2015, 4:22:39 PM7/7/15
to Giorgio Casaburi, trinityrn...@googlegroups.com
Hi Giorgio,

Sorry - I couldn't follow.  Did the script:
Trinotate_get_feature_name_encoding_attributes.pl Trinotate.xls > trinotate.gene_id_mappings

generate a useful output file?  Can you share the first few lines of the output?

best,

~brian

Matthew Neave

unread,
Nov 5, 2015, 1:07:36 AM11/5/15
to trinityrnaseq-users, giorgio...@gmail.com
Hi Brian,

I was thinking of doing something similar to Giorgio but I'm having a similar problem.

The output of the script "Trinotate_get_feature_name_encoding_attributes.pl" is:

c1000_g1        c1000_g1^bZIP_Maf
c1000_g1_i1     c1000_g1_i1^bZIP_Maf
c1000_g1_i2     c1000_g1_i2
c1001_g1        c1001_g1^Pep_M12B_propep
c1001_g1_i1     c1001_g1_i1^Pep_M12B_propep
c10030_g1       c10030_g1^Rhodanese
c10030_g1_i1    c10030_g1_i1^Rhodanese

I think the blast results aren't found because the perl script (line 55) searches for "Top_BLASTX_hit" whereas the annotation report header is "sprot_Top_BLASTX_hit". I didn't annotate for SingalP, etc., so Pfam is correctly the only other expected hit. 

From the perl script, I can't see where the GO annotations would be extracted for a heat map similar to Giorgio's earlier attachment? I was also wondering how you'd deal with the fact that some transcripts have many GO terms associated with them?

I wouldn't mind doing a bit of coding to get this working if you'd like! If you could just provide an example of the desired output table, I could probably figure it out.. :)

Thanks!

Matt. 

Brian Haas

unread,
Nov 5, 2015, 6:33:26 AM11/5/15
to Matthew Neave, trinityrnaseq-users, Giorgio Casaburi
Hi Matt,

This probably reflects my need to get a new release of Trinotate out there (long overdue).  Attached is a version of the script that should work with the latest report format.

best,

~b

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.

To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.
Trinotate_get_feature_name_encoding_attributes.pl
Reply all
Reply to author
Forward
0 new messages