GOplot support issues

796 views
Skip to first unread message

duart...@gmail.com

unread,
Apr 10, 2017, 10:37:06 AM4/10/17
to trinityrnaseq-users
Greetings,

I am using Trinity and Trinotate to run the DE, annotation and enrichment analysis on my transcriptome data, and it works great (thanks for all those scripts btw!).

However, I am with some difficulties to generate the plots using the GOplot.

Initially, I tried to run from the analyze_diff_expr.pl script. The DE analysis and GOseq worked for all my samples/comparisons, creating the expected output files. But during GOplot run, I noticed two issues:

1. for one of the comparisons, one of the files necessary for GOplot is not being created (EC.genelist), although EC.david is there. The following error was prompted:

Error, missing value for required column field: Term at /home/gtd/Bioinfo/Progs/trinityrnaseq-Trinity-v2.4.0/Analysis/DifferentialExpression/../../PerlLib/DelimParser.pm line 252.
    DelimParser::Writer::write_row(DelimParser::Writer=HASH(0x1406f20), HASH(0x11a0e50)) called at /home/gtd/Bioinfo/Progs/trinityrnaseq-Trinity-v2.4.0/Analysis/DifferentialExpression/prep_n_run_GOplot.pl line 107

I thought that the problem could be because my assembly is composed by the combination of different assemblers, meaning different transcript IDs (although for the other comparisons both EC files are generated). Nevertheless, the same error occurred when I tried to run the analysis using the sample data found in /Spombe_analyzeDiffExprWithGOseq/, either by introducing the --include_GOplot parameter to the runMe.sh script or by running the analysis step by step by myself.

2. As mentioned, for the remaining comparisons, both EC files were generated. However, the analysis crashes when trying to generate the first bubble graph, with the following error:

Error in data.frame(fg_params, label = as.vector(label_matrix), stringsAsFactors = FALSE) :
  arguments imply differing number of rows: 1, 0
Calls: GOBubble ... draw_table -> tableGrob -> gtable_table -> data.frame
Execution halted
Error, CMD: /home/gtd/Bioinfo/Progs/trinityrnaseq-Trinity-v2.4.0/Analysis/DifferentialExpression/GOplot.Rscript --EC_david trans_counts.counts.matrix.c_vs_s.DESeq2.DE_results.P1e-3_C1.DE.subset.GOseq.enriched.GOplot_dat/EC.david --EC_genelist trans_counts.counts.matrix.c_vs_s.DESeq2.DE_results.P1e-3_C1.DE.subset.GOseq.enriched.GOplot_dat/EC.genelist --pdf_outfile trans_counts.counts.matrix.c_vs_s.DESeq2.DE_results.P1e-3_C1.DE.subset.GOseq.enriched.GOplot_dat.pdf died with ret 256 at /home/gtd/Bioinfo/Progs/trinityrnaseq-Trinity-v2.4.0/Analysis/DifferentialExpression/prep_n_run_GOplot.pl line 234.

Using these EC output files, I ran the GOplot analysis on R by myself, and I found that the error occurred during this step:

> GOBubble(circ, labels = 3)
Error in data.frame(fg_params, label = as.vector(label_matrix), stringsAsFactors = FALSE) :
  arguments imply differing number of rows: 1, 0

In addition, although I was able to generate the other Bubble plots, none of them show the GO terms that should be respective to each circle.

Could you help me to solve these issues, please?

Best regards,

Gustavo

Brian Haas

unread,
Apr 10, 2017, 11:03:35 AM4/10/17
to duart...@gmail.com, trinityrnaseq-users
Hi Gustavo,

I'm glad to hear that most things worked.  The GOplot integration is a new addition and it's mostly to just get you started.  I don't have much time to provide high level support for it.  I'd suggest checking w/ the GOplot developers and see if they can provide some insight. 

sorry I can't help more right now.

~b

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrnaseq-users@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

Elena

unread,
Apr 12, 2017, 3:55:49 PM4/12/17
to trinityrnaseq-users
First of all, Brian, Thanks!

The addition you did to Trinity produce formatted files suitable for GoPlot is awesome. In my case, working with several species, it has been super nice to move beyond the Venn diagram into the graphical options from this package.  Visualization tools have helped us to see that the same GO terms enriched in several species behave very differently when you tease things apart and combine GO with gene plotting (which is the Go Plot feature i like!).

Now, regarding to problems with the files created.

Gustavo, I did find some issues with the files that were automatically generated. However, it is super easy to make them from the files in trinity and run the files in R. I have only used the graphs bubble bar and circ from the GoPlot R package.

to generate the files manually, open the files

1/      XXXXXXXXXXXXXXX.DESeq2.DE_results.P1e-3_C2.DE.subset.GOseq.enriched -----> to generate EC.DAVID

2/     XXXXXXXXXXXXXX. DESeq2.DE_results.P1e-3_C2.DE.subset -----------> to generate EC.genelist

and modify them to be run in R with the text editor of your choice. basically is just to move a few columns around and change the headers. Also, remember to check and select a cutting point in the FDR column in order to have significant GO terms!

Probably the issues you are seeing in R are related to those original files. If you prepare them manually from the info already in trinity analysis you should be fine.

I am not sure i understand this: In addition, although I was able to generate the other Bubble plots, none of them show the GO terms that should be respective to each circle.
Could you explain a little more?

duart...@gmail.com

unread,
Apr 17, 2017, 3:59:25 AM4/17/17
to trinityrnaseq-users
Hello Elena, thanks!

I was checking the files and apparently the script is recovering data from the wrong columns from the DE.subset to generate the GOplot EC.genelist input file. The DE.subset D and G column values are being recovered as logFC and adj.P.Val, respectively, while it should be from columns G and K, respectively.

In addition, apparently the automatically generated EC.david files simply recovers the initial list provided by the .DE.subset.GOseq.enriched files which is based on p-values, but not FDR.  I did not have time yet to go into full details of GOseq pipeline but seems to me that the final DE.subset.GOseq.enriched/depleted files are a compilation from the control-UP.subset.GOseq.enriched/depleted and treatment-UP.subset.GOseq.enriched/depleted, with an extra statistical test. However, since I am specifically interested in the enrichment/results for each condition, makes more sense to me generate the files manually and considering the FDR as you suggested.

Best wishes,

Gustavo

Elena

unread,
Apr 19, 2017, 1:16:17 PM4/19/17
to trinityrnaseq-users
Good luck with that!
really, generate the files is a moment. by the way, if you are not setting a reference in the DE analysis (e.g. control vs sample set 1, control vs. sample set 2, sample 1 vs sample 2) remember to double check what you are actually comparing. 1 against 2 or 2 against 1! |log fold| would be the same, but the it would have opposite sign!.

If you run into any trouble, let me know... I'll try to help. :)

Govind Raj

unread,
Nov 22, 2017, 1:45:09 PM11/22/17
to trinityrnaseq-users
Dear Elena and other members,

I am having format issues with GOplot. I have used Trinity pipeline for de novo transcriptome assembly, transcript abundance estimation (bowtie2 & RSEM) and edgeR was used separately (not followed Trinity scripts) for DE analysis. Also used Trinotate pipline to annotate Trinity de novo transcriptome (cd-hit-est processed). I have managed to run GOseq  successfully using run_GOseq.pl and outputs from these process were then used to run GOplot using prep_n_run_GOplot.pl trinity script. But, I am getting format issue error as listed below:
"Error, file: ../pcandei_DE_genes_trinotation/FLvsEH.Acute_tr_0.05_FDR.txt has unexpected format... no 'sample' starting header.
at /home/raj/data3/trinityrnaseq-devel/Analysis/DifferentialExpression/prep_n_run_GOplot.pl line 151, <$fh> line 1."

I followed following thread to resolve above issue: https://groups.google.com/forum/#!searchin/trinityrnaseq-users/GOplot|sort:date/trinityrnaseq-users/WQUsHAMt7as/XA42VuI-AgAJ
But unfortunately, it didnt work for me and gave following error:
Command: raj@wavesim:~/packages/trinityrnaseq-2.2.0/Analysis/DifferentialExpression$ ./prep_GOplot.pl ~/data3/pcandei_trinotation/pcandei_Trinotate.xls.gene_ontology ~/data3/pcandei_trinotation/FLvsEH_goseq_17112017/FLvsEH.Acute_tr_0.05_FDR.GOseq.enriched > ECDavid_format_FLvsEH.Acute_tr_0.05_FDR.GOseq.enriched.txt

Error from above command:
Error, missing value for required column field: Term at /home/raj/packages/trinityrnaseq-2.2.0/Analysis/DifferentialExpression/../../PerlLib/DelimParser.pm line 221.
        DelimParser::Writer::write_row(DelimParser::Writer=HASH(0x5eb8150), HASH(0x5ea8c18)) called at ./prep_GOplot.pl line 43

Then, I went through current thread(https://groups.google.com/forum/#!searchin/trinityrnaseq-users/Error$2C$20missing$20value$20for$20required$20column$20field$3A$20Term$20at|sort:date/trinityrnaseq-users/L2PXl5lNfSE/yLf2LMgWEQAJ),
but I am not able to understand instructions given here for manually generating EC.DAVID and EC.genelist.

So please can you explain the instructions with more details.

Many thanks in advance.

Regards,
Govindraj Chavan

Brian Haas

unread,
Nov 24, 2017, 11:09:00 AM11/24/17
to trinityrnaseq-users

finally able to look into this



I was checking the files and apparently the script is recovering data from the wrong columns from the DE.subset to generate the GOplot EC.genelist input file. The DE.subset D and G column values are being recovered as logFC and adj.P.Val, respectively, while it should be from columns G and K, respectively.


The column extractions should be working correctly here.  Note, the column headers are off by one, which is an R thing (and annoying, but that's R).

 
In addition, apparently the automatically generated EC.david files simply recovers the initial list provided by the .DE.subset.GOseq.enriched files which is based on p-values, but not FDR. 

It should all be based on FDR and not the original p-values.

 
Best wishes,

Gustavo

Govind Raj

unread,
Nov 24, 2017, 11:12:25 AM11/24/17
to trinityrnaseq-users
Dear Dr Brian,

Thank you very much for sorting this issue and clarifying doubts regarding prep_n_run_GOplot.pl output files.

Regards,
Govindraj Chavan

Brian Haas

unread,
Nov 24, 2017, 11:21:37 AM11/24/17
to Govind Raj, trinityrnaseq-users
Sure thing.

Just to summarize some of the issues we were having:

The DE analysis needs to be done using the trinity run_DE_analysis.pl in order to get the inputs properly formatted for use with the pre-GO-plot script.  

The GO plotter example script had GO IDs hardcoded (probably from the goplot vignette) which weren't relevant to the present study.  I've commented out that part of the script and the update will go in the next release.

Feel free to add anything else, Govind.

best,

~b

Govind Raj

unread,
Nov 24, 2017, 11:27:04 AM11/24/17
to trinityrnaseq-users
Dear Dr Brian,

Thank you once again for putting clear points regarding input file format issues with prep_n_run_GOplot.pl.

Its invaluable effort from your side to resolve this issue.

Regards,
Govindraj Chavan
Reply all
Reply to author
Forward
0 new messages