Hi Antony,
My mistake. I am not using all gene sets; I am using the hallmark collection as I understand is suggested for initial analysis and to guide further research. This is already quite a small collection, so maybe it wouldn't have had as inflating of an effect as I thought. My approach is to investigate the more significant pathways and use them to guide the curation of a collection. I've seen this done in a few papers, but I'm not sure how it's done. Is the idea to extract gene sets related to a particular biological phenomenon, such as thrombosis or B cell proliferation, from a series of relevant collections, and then to use this collection in the analysis? I've seen quite a few posts on here about using your own set of gene sets. Is that a common or suggested approach to gain finer resolution of enriched pathways proceeding analysis using the hallmark collection? If so, do you have any suggestions regarding this approach?
Interestingly, I used equalize and balance on my uneven phenotype comparison (n = 17 and 55) and the FDR values increased. I then used equalize and balance on another, more even phenotype comparison (n = 37 and 35) and the FDR values were more significant. Both comparisons use the same data from the same cohort.
Thank you for your suggestions. They were very helpful. For my uneven phenotype comparison (the one in my first post), I am still not getting significant results, with the lowest FDR being around 0.6 on equalize_and_balance and 0.36 on no_balance. For my more even phenotype comparison, I'm getting a few significant results, with the lowest FDR being around 0.2 on equalize_and_balance and 0.225
on no_balance.
I have another (possibly naive) question on the interpretation of enrichment plots like the one below. It is my understanding that this means the interferon gamma response pathway is significantly upregulated in the right-hand 'HighNeut' group. Is that to say it is significantly downregulated in the left-hand 'LowNeut' group?
Thank you for your help. I really appreciate it.