nanostring data analysis

Shwetank Verma

unread,

Aug 2, 2018, 11:56:34 PM8/2/18

to gsea-help

I had done a nanostring analysis using a mouse inflammation panel. I am trying to analyze my nanostring output data using GSEA software. In this analysis, there is a section where a analysis needs a input of the "chip platform". In the menu, they have various options as agilent, illumina etc chip names but I dont find any input value for the nanostring panel. Has anyone used GSEA software to analyze nanostring data? please let me know if you can help me understand what input i should make in this.

David Eby

unread,

Aug 6, 2018, 9:06:13 AM8/6/18

to gsea-help

Hi Shwetank,

The "chip platform" selection is only required if you are working at the probe (or transcript) level and need to collapse these to the gene level. If you already have data at the gene level then collapsing is not necessary (set the option to false). Just make sure that you have HUGO gene symbols if you are using the MSigDB gene sets we provide online.

Regards,

David

David Eby
www.gsea-msigdb.org
igv.org
genepattern.org

amag

unread,

Aug 21, 2018, 5:41:49 PM8/21/18

to gsea-help

Hi David,

I have NanoString data that I would like to analyze.

I have a total of 40 samples (comprised of 2 cohorts A and B for which n is 21 and 19 respectively).

The panel used assessed the expression of 739 genes.

Differentially expressed genes were determined using NanoStringDiff [PMID 27471031]

I investigated the resulting differential gene lists using the online "Investigate Gene Sets" tool to determine overlaps with MSigDB curated gene sets

The top 100 gene set overlaps were computed using the default FDR threshold of 0.05.

Although there was quite a bit of redundancy/repetition in the list of gene sets, several gene sets of interest were found to be of interest.

However, when the same data is analyzed using GSEA, gene sets deemed significant by overlap analysis were not significant by GSEA.

Although I understand that GSEA accounts for actual expression levels whereas the overlap analysis does not, my question is when is it appropriate to use one analysis tool over another?

Does the cohort size or the number of genes impact the decision?

Is the output of the online overlap analysis tool, suitable for publishing?

Many thanks for your help in advance

amag

ptamayo

unread,

Aug 23, 2018, 12:30:59 AM8/23/18

to gsea-help

Amag,

Both methods (MSigDB overlap and GSEA) can be published if you clarify which one you are using in your analysis.

The overlap uses a simplistic hypergeometric null distribution and will tend to produce p-values that are overly optimistic and in general quite smaller than GSEA. GSEA produces more realistic p-values & FDRs considering cohort size and actual differential expression values but may produce non-significant results if the signal is weak as it appears to be the case in your data. Another factor to consider is that as you are using nanostring the number of genes is quite smaller (739) than a typical genome-wide differential analysis and you can expect less genes to be significant.