Hi
Eleni,
No, GSEA does not perform a log transformation internally. Applying a log transformation fundamentally changes the nature of the math being done in the computation of the signal to noise ratio, which could be the cause of the odd results you saw when using that data.
We always recommend using the standard mode of GSEA when you have access to the underlying counts matrix.
It is atypical to see that kind of asymmetrical enrichment result. It isn't impossible, it definitely does happen, but it is odd.
Enrichment of ribosomal biogenesis isn't necessarily an invalid result if everything else looks fine with the dataset.
The collapsing mode should be altered depending on the nature of the values being provided. For example, for RNA-seq data it (generally) makes sense to sum them when collapsing because each count is a discrete "real" object, and if two annotated sequences map to the same gene, the actual underlying fragments that existed in the sample are both directly contributing to that gene. However, with intensity values from a microarray for example, it doesn't really make sense to sum them because they aren't a discrete measure, so we recommend the "max probe" approach which was the standard for that sort of data analysis.
When operating on ranked lists, neither of these really make sense however - if you have two annotated sequences and one increased by 2 fold and one increased by 1.5 fold, it doesn't really make sense to say that the shared gene they mapped to increased by 3.5 fold- so "sum" doesn't really work, and in a scenario where one went up by +1.5 (log2fc) but another went down by 2 (e.g. -2 log2fc), if you used the "max" option, it would always keep the positive value which isn't valid either. A weighted average would probably be the best way to collapse multiple sequences to a single gene but we don't have any way to determine how to weight them from the data we have access to in the ranked list, so we settle for taking the "most extreme" change ( abs_max_of_probes). This avoids issues with both summing, and with the positive bias of the max probe approach.
Sorry we don't offer a better explanation of this in the user guide, but hopefully this makes sense.
Let me know if you have more questions!