Permutation type in proteomics

Martin Menkyna

unread,

Mar 27, 2023, 4:31:29 AM3/27/23

to gsea-help

Greetings everyone.

I'd like to ask about the use of GSEA in proteomics study and, in particular, the option of permutation used. In genomics, if I understand it correctly, you have thousands upon thousands genes in a study, in proteomics the situation is quite different and for instance our measurements peak at something over 1000 proteins found in a single study.

Now, my question is whether this discrepancy in the number of "genes" has an effect on the option of permutation type that can be used. In the User Guide it is written for Gene Set permutations that "This method is useful when you have too few samples to do phenotype permutations (that is, when you have fewer than seven (7) samples in any phenotype).". However, gene set permutations give much better results when I use the option at least on the surface with ~20 samples in each phenotype and, as I said, about 700-900 proteins in each sample.

Is running gene set permutations still a mistake even in the case of proteomics with much fewer genes in each sample?

Thank you for any advise, hopefully my question is not impossible to understand.

Martin Menkyna

unread,

May 10, 2023, 7:59:53 AM5/10/23

to gsea-help

Hello.

I'd like to just reiterate my inquiry. Could anyone help me in this dilemma please?

Thank you.

Castanza, Anthony

unread,

May 10, 2023, 1:32:52 PM5/10/23

to gsea...@googlegroups.com

Hi Martin,

My apologies for missing your question it somehow slipped through the cracks. Unfortunately I can’t really give hard recommendations for proteomics data as this isn’t an area we have explicit support for, but I hope these answers help somewhat.

The key question is, for gene set permutation for a given expression dataset, and a given gene set of a particular size, can I sample from the distribution unique random sets of that size enough times to construct a valid null distribution. For expression datasets from microarrays or RNA-seq where you have 10’s of thousands of genes, that answer is always yes. For small datasets like proteomics, your chances to construct materially different sets to produce a valid null are much lower. That said, if this is failing, I would actually expect the significance statistics to be worse since the null scores would be more similar to the true scores.

As a general case, gene set permutation will (almost) always produce more “significant” pValues/FDRs than phenotype permutation. The phenotype permutation test is just substantially more stringent and does a better job of preserving gene-gene correlation effects that can impact the scores (and should be accounted for), whereas gene set permutation breaks these correlations. We generally recommend the phenotype permutation test if there are enough samples to perform it as any significant results from it are much more likely to be robust findings. Its at least partially for this reason that we recommend a FDR<=0.25 cutoff for data run with phenotype permutation, but the standard FDR<=0.05 cutoff for data run with gene set permutation.

Sorry I couldn’t be of more help, let me know if you have follow-up questions and I’ll answer as best I can.

-Anthony

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

Mesirov Lab, Department of Medicine

University of California, San Diego

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/17ffe58e-5d50-4e96-a981-37e3b9508e1fn%40googlegroups.com.

Reply all

Reply to author

Forward