Hello,
I am working with single-cell data where the capture is low.
After filtering the genes for minimal expression, I am left with 2000 genes after filtering.
I run a regression on the data, with the value of each gene being the explained variable, the quality I am interested in as the explanatory variable, with one confounding variable.
The sign( of the coefficient) * -log10(of the pvalue) for the explanatory variable I am interested in is "fed" into pre-ranked GSEA.
My question is whether the small number of genes that my statistical test starts with can cause the FDR of GSEA's results to be artifically smaller, causing false positives?
If by starting with a small subset of genes can only cause the FDR to be artificially larger (causing false negatives) this is not a problem - I am worried that the results that I do see are false positives.