I have NanoString data that I would like to analyze.
I have a total of 40 samples (comprised of 2 cohorts A and B for which n is 21 and 19 respectively).
The panel used assessed the expression of 739 genes.
Differentially expressed genes were determined using NanoStringDiff [PMID 27471031]
I investigated the resulting differential gene lists using the online "Investigate Gene Sets" tool to determine overlaps with MSigDB curated gene sets
The top 100 gene set overlaps were computed using the default FDR threshold of 0.05.
Although there was quite a bit of redundancy/repetition in the list of gene sets, several gene sets of interest were found to be of interest.
However, when the same data is analyzed using GSEA, gene sets deemed significant by overlap analysis were not significant by GSEA.
Although I understand that GSEA accounts for actual expression levels whereas the overlap analysis does not, my question is when is it appropriate to use one analysis tool over another?
Does the cohort size or the number of genes impact the decision?
Is the output of the online overlap analysis tool, suitable for publishing?
Many thanks for your help in advance
amag