--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/9fe31215-a6d7-4c28-8bd8-d9028ddc34dbn%40googlegroups.com.
You received this message because you are subscribed to a topic in the Google Groups "gsea-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gsea-help/vPNwgxeflEE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/95af0484-ad48-4d95-aad5-ef0199d5718bn%40googlegroups.com.
Hi Wendy,
I wouldn't have recommended going much beyond 10,000, I haven't crunched the numbers on actual maximum valid permutations for a gene list of N features, but it's going to hit a point of diminishing returns, especially with the memory usage tradeoff as you've noticed.
The issue is that those gene sets which have NaN for statistics could very well be considered hits (or at least candidates). Think of it this way, that set received an enrichment score of, say, +0.8 in the real data, when we generated the permutation matrix for a set of that size there were zero sets in that matrix that also had a positive sign (or vice versa for a set with negative enrichment score in the real data, there would have been results with a negative sign in the permuted matrix for that set). GSEA in Preranked/gene set permutation mode, calculates statistics by looking at the real enrichment score and using it as a threshold for evaluating the null distribution, asking "how many times in the null distribution did the permutations for this set score as well as the real set". So let's say you have an ES of +0.8 again, and you did 1000 permutations. Of those 1000 permutations, 500 have a positive score, of those 500, only 2 have a score >=+0.8, so, the pValue for that set would be 2/500, or 0.004. The FDR is a little more complicated in that it looks at the global distribution of set scores, but basically follows the same principle. The issue here is arising from that "2/500" calculation. What is happening here is that there are not 500 sets in the denominator of that calculation, there are 0 which is causing a divide by zero. By increasing the permutation number, we were trying to eliminate that divide by zero.
In this case, it unfortunately appears as if the skew in the dataset is too extreme to overcome by increasing the permutation number, so those sets can't be fully evaluated, but GSEA also isn't able to rule them out entirely either.
-Anthony
Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego
From:
gsea...@googlegroups.com <gsea...@googlegroups.com> on behalf of
陳冠雅 <t771...@gmail.com>
Date: Tuesday, January 18, 2022 at 11:54 PM
To: gsea...@googlegroups.com <gsea...@googlegroups.com>
Subject: Re: [gsea-help] GSEAPreranked is successful with warnings
Hi Anthony,
Thanks for answering my questions. I realize why the warning shows and what it means.
I set the permutation number to 10000, 20000, 40000, 60000, but there are still "NaN" values in NES and NOM p-val with fewer and fewer pathways. When I set the permutation number to 100000, there is an error: OutOfMemoryError: Java heap space. I have increased heap space through -Xmx8g, which reached the memory maximum. Thus I think I fail to do anything to improve the result or solve the warning problem. However, I only need the data with NES and FDR q-val for top 50 GOBP, so it seems that those "NaN" values don't influence the other data I need, right?
Thanks!
Wendy
Anthony Castanza <acas...@gmail.com> 於 2022年1月19日 週三 上午1:44寫道:
Hi Wendy,
Ah, I see, I misunderstood the error message. This is a slightly different error where rather than NA/NaN values in the input ranked list, GSEA has instead failed to generate a valid null distribution for those gene sets.
Looking at the data you sent to gsea-team, it appears as if Ranked List 1 has many more negative correlations than positive correlation, and conversely, Ranked List 2 has many more negative correlations than positive. With highly unbalanced distributions like this, GSEA can fail to generate nulls that are capable of generating enrichments for sets on that smaller side. Increasing the permutation number is really the only thing we can offer as a potential solution here, since this is something that occurs due to the nature of the datasets themselves.
-Anthony
Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego
On Monday, January 17, 2022 at 6:19:05 PM UTC-8 陳冠雅 wrote:
Hi,
I send my ranked list to gsea...@broadinstitute.org. with title "[gsea-help] GSEAPreranked is successful with warnings".
I think the warnings show because the gene set at bottom and top has "NaN" as NES and NOM p-val. Also, this issue was reported before: https://groups.google.com/g/gsea-help/c/X2rF94Pxn-Q. So should I increase the number of permutations to maybe 10000 or higher?
Thanks!
Wendy
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/CAAJn%3Dhm6C9dTzNZRu6j6K2QGUvC1zbskWo6Bf9w4nhjpbni--A%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/SJ0PR05MB7609FC80FBEAC9F5D0017992F7599%40SJ0PR05MB7609.namprd05.prod.outlook.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/CAAJn%3DhkWWUWKfimf6N53iUbvQ0jszw5pUzOcoCmVc_iSyZMtkA%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/26e5a72b-0d3a-458d-9886-151fec3b8199n%40googlegroups.com.
Hi Fazhir,
Generally we ask that people create new threads with their specific issues so that they can be directly addressed, rather than bumping old threads.
Could you tell me a little bit more about the dataset you’re using here, particularly number of samples in each phenotype group, permutation mode used, ranking metric, etc? It is difficult to make any determinations from the single plot you’ve shown. Based on general failure modes of GSEA however, my initial thought is that your experiment might be under-powered for the statistical assumptions of the GSEA method.
-Anthony
Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego
From: David Eby
Sent: Monday, July 3, 2023 11:13 AM
To: gsea...@googlegroups.com
Subject: Re: [gsea-help] Re: GSEAPreranked is successful with warnings
Hi Fahzir,
Anthony is traveling for an extended holiday and won't be back until later this week. I'm sorry to say that I can't offer much help on your question, but I'll make sure he sees this when he returns.
On Fri, Jun 30, 2023 at 4:28 PM Kayondo Fadhir <kamf...@gmail.com> wrote:
Hi Anthony,
I came across this thread and found it useful as it addresses the same concern am having.
In addition to that error message, my results tend to have all Gene sets having an FDR = 1. Although the GO terms on top have known biological relationships with my trait of interest, I fail to pick any as significant as none has FDR < 25%.
To follow up on the cause of all the FDRs being 1, I plotted the histogram of the nominal p-values and realized it is U-shaped. (Please see the below). Is it usual for this plot to be U-shaped in a normal run? If not, could you trace where my issue is?
I will appreciate any assistance. Thanks
Fazhir
On Sunday, 16 January 2022 at 23:58:56 UTC-6 陳冠雅 wrote:
Hello,
I am running GSEAPreranked with GSEA desktop and the parameter for enrichment statistic is weighted. The number in the gene list I used means correlation coefficient between -1 to 1 without NA or inf, which represent some biological meanings. When the analysis is done, it shows that the analysis has succeeded but with warnings (pink color).
I checked th index.html page, it showed the text below: "Scoring produced infinite or NaNs values which may have prevented plotting for certain gene sets. See the log for more details". Alos, the log showed the text with warn for 72 GOBP like:
985900 [WARN ] - Scoring of GOBP_MONONUCLEAR_CELL_DIFFERENTIATION produced infinite Or NaN value(s)
985900 [WARN ] - Scoring of GOBP_REGULATION_OF_HEMOPOIESIS produced infinite Or NaN value(s)
985900 [WARN ] - Scoring of GOBP_REGULATION_OF_APOPTOTIC_SIGNALING_PATHWAY produced infinite Or NaN value(s)985901 [INFO ] - Scoring produced infinite or NaNs values which may have prevented plotting for certain gene sets. See the log for more details.
However, when I set the parameter for enrichment statistic to classic, the analysis is successful without any warnings.
So I wonder know that why the warnings appear and whether I should change the parameter for enrichment statistic from weighted to classic or not.
Thanks and best regards,
Wendy
--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/26e5a72b-0d3a-458d-9886-151fec3b8199n%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/CAFEo9XhWcaVgGDtgvb3aef%2BVxeH8mUzMfhykTFzMf0mf2C3wEw%40mail.gmail.com.