Expression dataset

609 views
Skip to first unread message

Bernhard

unread,
May 6, 2021, 12:18:11 PM5/6/21
to gsea-help

Dear all GSEA users,

 I have RNAseq data and I want to use them for GSEA (2 groups and each group contains 3 replicates). I am not sure now which values (read counts, fpkm, TPM etc.) I have to upload in “Expression dataset” means to use for GSEA. I have read that it is possible to do with fpkm values which are provided directly by the RNAseq. Is that right or do I need a normalization, other values...?

 Thank you for your help in advance

Anthony Castanza

unread,
May 6, 2021, 12:20:17 PM5/6/21
to gsea...@googlegroups.com

Hi Bernhard,

 

We do not recommend using FPKM for GSEA. These values are not properly normalized for “between-sample” differential expression calculations which is effectively what GSEA has to do to rank the genes. We recommend running something DESeq2, and extracting the normalized counts table that is produced during that analysis. The DESeq2 modules on both GenePattern.org and UseGalaxy.org produce this output (GenePattern does it by default, Galaxy would need “Output normalized counts table” toggled to True first). You can also use other methods of between-sample normalization such as TMM but the DESeq2 method is probably the easiest.

 

An additional note, when running GSEA on small groups such as a 3 vs 3 experiment. You’ll need to set the permutation type to “gene_set” instead of the default “phenotype”.

 

-Anthony

 

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

http://gsea-msigdb.org/

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/41ab735b-29ae-407d-af5e-410fdbcafda5n%40googlegroups.com.

Bernhard

unread,
May 7, 2021, 2:17:15 AM5/7/21
to gsea-help
Thank you, Anthony. So I have to use the module DESeq2? Which file format should the raw round count/input file be? My RNAseq provider submitted it as an Excel file.

Anthony Castanza

unread,
May 7, 2021, 2:23:38 AM5/7/21
to gsea...@googlegroups.com
Hi Bernhard,

You don't have to use DESeq2, but if you're unfamiliar with the necessary data normalization steps it's certainly the most straightforward way to do it.

You should be able to get to this format pretty simply from excel by adding the requisite header rows, Description columns, etc. Then you'll want to save as txt (tab delimited). This will give the file a .txt extension which you'll need to change to .gct


Let me know if you have any questions about the process.


-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

Bernhard

unread,
May 7, 2021, 4:36:19 AM5/7/21
to gsea-help
Thank you, Anthony! Your advices are perfect! I thank it worked so far.

Bernhard

unread,
May 7, 2021, 4:55:58 AM5/7/21
to gsea-help
Hi Anthony,

once again: My main problem was which raw data I must use for GSEA. So I used now the raw read counts and normalized them with DESeq2. Hopefully, that is right...?

Bernhard

unread,
May 11, 2021, 11:03:52 AM5/11/21
to gsea-help
Everything works great, thank you for the immediate help!

Anthony Castanza

unread,
May 11, 2021, 12:02:05 PM5/11/21
to gsea...@googlegroups.com
Hi Bernhard,

My apologies for missing your final request for clarification. Yes, the normalized read counts table taken after normalizing with DESeq2 is the correct input for standard GSEA.
Glad you were able to get it to work.



--

Bernhard

unread,
Jun 2, 2021, 12:35:18 PM6/2/21
to gsea-help
If you normalize your raw read counts with DESeq2 through Gene Pattern, you will get a ranked gene list, too. What exactly is the value of the score therein?

Anthony Castanza

unread,
Jun 2, 2021, 1:01:24 PM6/2/21
to gsea...@googlegroups.com

Hi Bernhard,

 

The DESeq2 module also performs a standard differential expression analysis and provides those results, but I don’t think it actually formats it as a RNK file, does it?

Bernhard

unread,
Jun 4, 2021, 4:06:38 AM6/4/21
to gsea-help
Sorry, I was misleading. With "ranked" I meant the genes which are arranged from the highest to the lowest score. It is a ranked_gene_list...xls
Reply all
Reply to author
Forward
0 new messages