Help with javaGSEA

104 views
Skip to first unread message

Johann Shane Tian

unread,
Jun 11, 2018, 9:48:52 PM6/11/18
to gsea-help

Dear Sir/Mdm,

 

My colleagues and I are trying to perform a gene set enrichment analysis with the javaGSEA Desktop Application and would like to get some advice about our process; we are novices in the field. We have consolidated a list of genes (upregulated and downregulated) from a differential expression analysis (DEA) from RNAseq datasets. And we would like to know whether there are overlaps within the NFkB signling pathway. From what I understand of the documentation, the tool requires me to generate an expression dataset and phenotype labels. But we are not familiar of how to get these data from our DEA. We read the article about GSEA on PNAS, but from what we understood of the article, it uses microarray data as an example. Does the tool, then, only works for microarray datasets? If not, then is there a consensus website that we could get reference files for the expression dataset and phenotype labels?

 

We heard from other bioinformaticians that websites like ConsensusPathDB and Pantherdb are able to conduct gene set enrichment analysis too. We noticed that they only require our list of genes generated from the DEA. Is it right for us to be using their tools instead of the javaGSEA? Is javaGSEA able to just use the gene list only?

 

Hope you can help us understand the program better.

 

Thank you.

 

Regards,

Johann Shane

ptamayo

unread,
Jun 12, 2018, 1:58:28 PM6/12/18
to gsea-help
Johann,

  If you just want to know if there are overlaps between your list of top differential expression analysis (DEA) genes and NFkB pathways I'd suggest you cut your list of genes and paste it in the query window of the MSIgDB "Investigate Gene Sets" web page:


 If I do this exercise with one of my DEA analysis which is NFkB relevant I get the results attached below (selecting collections h, C2, C3 and C6) . As you can see entry number 6 is the gene set "HALLMARK_TNFA_SIGNALING_VIA_NFKB " which has a significant overlap. Try with your gene list and see what you get.

Best,

--Pablo
Screen Shot 2018-06-12 at 10.52.50 AM.png

Johann Shane Tian

unread,
Jun 12, 2018, 10:14:15 PM6/12/18
to gsea...@googlegroups.com
Hi Dr Pablo,

I tried your method (with an inclusion of C5), and yes I gotten hits like " GO_REGULATION_OF_I_KAPPAB_KINASE_NF_KA
_KAPPAB_SIGNALING" and " GO_POSITIVE_REGULATION_OF_I_KAPPAB_KIN
KINASE_NF_KAPPAB_SIGNALING". But I noticed that RELB is not overlapped in the two pathways (as attached, columns 2 and 3). REL is known to be associated with NFkB. Is there a reason why this isn't captured?


But I do agree that this webtool is very useful for me. Another thing, is there a way that I could get graphs like the ones below just with my differential expressed gene list?


Thank you.

Regards,
Johann


--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/fe10c730-1d35-4bb5-9307-4a00f92885bb%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

PABLO TAMAYO

unread,
Jun 12, 2018, 10:58:19 PM6/12/18
to gsea...@googlegroups.com
I think it is just that those two gene sets for some reason do not include REL as a member.

You can put your gene list in a text file and run pre-ranked GSEA using the c5 collection to produce the enrichment plots like that figure. You can use the GenePattern (genepattern.org) pre-ranked GSEA module to accomplish that (see fig attached). The ranked list will be your sorted DEG. The gene set database will be the c5 GMT file which you can get from here:


--Pablo





___________________________________________________________
Dr. Pablo Tamayo (pta...@ucsd.edu)

Director, UCSD Center for Cancer Target Discovery and Development (CTD2)
Group Leader, Computational Cancer Analysis Laboratory (CCAL)
Co-Director, Genomics and Computational Biology Shared Resource (GCBSR)
Professor, Division of Medical Genetics, UC San Diego School of Medicine

UC San Diego Moores Cancer Center (Room 3017)
3855 Health Sciences Drive MC 0658, La Jolla, CA  92093-0658

Visiting Scientist, Cancer Program, Broad Institute of MIT/Harvard
___________________________________________________________

Johann Shane Tian

unread,
Jun 12, 2018, 11:34:05 PM6/12/18
to gsea...@googlegroups.com
Dear Dr Pablo,

Thank you for your advice. Seems like I need to start ranking my differential genes first.

Do you mind if I further ask if the difference between pre-ranked GSEA and GSEA is just the input and type of statistical test used?

I saw a webpage that shows me instructions on how to rank my genes (http://genomespot.blogspot.com/2014/09/data-analysis-step-8-pathway-analysis.html). I will give it a try first and let you know if it works for me.

Thank you again Dr Pablo.

Regards,
Johann

PABLO TAMAYO

unread,
Jun 13, 2018, 10:51:21 PM6/13/18
to gsea...@googlegroups.com
Johann,

  It is the same statistical test but the difference is that in the pre-ranked the genes have been already sorted in some way and in the standard GSEA the sorting has to be done e.g. by differential gene expression using the original dataset.

 Give a try to pre-ranked GSEA and let me know if it works for you.

Best,

--Pablo


___________________________________________________________
Dr. Pablo Tamayo (pta...@ucsd.edu)

Director, UCSD Center for Cancer Target Discovery and Development (CTD2)
Group Leader, Computational Cancer Analysis Laboratory (CCAL)
Co-Director, Genomics and Computational Biology Shared Resource (GCBSR)
Professor, Division of Medical Genetics, UC San Diego School of Medicine

UC San Diego Moores Cancer Center (Room 3017)
3855 Health Sciences Drive MC 0658, La Jolla, CA  92093-0658

Visiting Scientist, Cancer Program, Broad Institute of MIT/Harvard
___________________________________________________________

Johann Shane Tian

unread,
Jun 18, 2018, 4:41:46 AM6/18/18
to gsea...@googlegroups.com
Dear Dr Pablo,

Thank you for the follow up, and sorry for the delay.

I will get back to you on my progress.

Thank you once again.

Regards,
Johann

Johann Shane Tian

unread,
Jun 18, 2018, 10:07:21 AM6/18/18
to gsea...@googlegroups.com
Dear Dr Pablo,

I tried the pre-ranked GSEA with a dummy list and it works fine. But I have a problem generating a ranked list with my differential expressed genes. I read in several forums of how to rank it. Usually they would take the sign of the log2foldchange and -log10(p-value). Subsequently, take -log10(p-value)/sign of log2foldchange. They also presented other options like goseq or GSEAbase to rank the genes. What I am facing is the results generated from DESeq2 - genes that go beyond the threshold will present parameters (e.g., pvalue, padj, etc) as "NA". But "NA" is irrelevant in the pre-ranked GSEA execution.

Some advised me to add the "cooksCutoff = FALSE" during DESeq2 results, but it didn't work for me; it still produces the same results. Dr Michael Love, developer of DESeq2, advised on changing the "NA" to "1". I tried that and it still gave "NaN" as the results in the rank list. I'm not sure whether it is sound to change the "NaN" to "0" though.

Do you happen to have any advice for me?

Regards,
Johann

PABLO TAMAYO

unread,
Jun 18, 2018, 7:52:04 PM6/18/18
to gsea...@googlegroups.com
Hi Johann,

For the pre-ranked GSEA you don't need the p-values but the DEA scores themselves e.g. the log of the ratio of each gene between A and B phenotypes or the t-test statistic rather than the -log10(p-value). Do you have NA's in the DEA scores themselves?

--Pablo

___________________________________________________________
Dr. Pablo Tamayo (pta...@ucsd.edu)

Director, UCSD Center for Cancer Target Discovery and Development (CTD2)
Group Leader, Computational Cancer Analysis Laboratory (CCAL)
Co-Director, Genomics and Computational Biology Shared Resource (GCBSR)
Professor, Division of Medical Genetics, UC San Diego School of Medicine

UC San Diego Moores Cancer Center (Room 3017)
3855 Health Sciences Drive MC 0658, La Jolla, CA  92093-0658

Visiting Scientist, Cancer Program, Broad Institute of MIT/Harvard
___________________________________________________________

Johann Shane Tian

unread,
Jun 18, 2018, 9:53:06 PM6/18/18
to gsea...@googlegroups.com
Dear Dr Pablo,

Yes, apparently there are NAs found in the statistics column of some genes after the DESeq2.

So what you mean is that I just need to use the "stats" values (either Likelihood Ratio Test or Wald's Test featured in DESeq2) for the ranking? I don't think I happen to come across any article (yet) that says I could use the stats values though.

Regards,
Johann

Johann Shane Tian

unread,
Jun 29, 2018, 12:25:43 PM6/29/18
to gsea...@googlegroups.com
Dear Dr Pablo,

Just to follow up with my work, I was able to generate a rank list from a published pipeline, and conducted the GSEA. I have already conveyed the analysis to the relevant members.

Thank you very much for your help.

Regards,
Johann

PABLO TAMAYO

unread,
Jun 29, 2018, 10:06:23 PM6/29/18
to gsea...@googlegroups.com
Your welcome. Thanks for the update. I'm glad things worked out well.

Best,

--Pablo

___________________________________________________________
Dr. Pablo Tamayo (pta...@ucsd.edu)

Director, UCSD Center for Cancer Target Discovery and Development (CTD2)
Group Leader, Computational Cancer Analysis Laboratory (CCAL)
Co-Director, Genomics and Computational Biology Shared Resource (GCBSR)
Professor, Division of Medical Genetics, UC San Diego School of Medicine

UC San Diego Moores Cancer Center (Room 3017)
3855 Health Sciences Drive MC 0658, La Jolla, CA  92093-0658

Visiting Scientist, Cancer Program, Broad Institute of MIT/Harvard
___________________________________________________________

Reply all
Reply to author
Forward
0 new messages