a small patch for q-value calculation

16 views
Skip to first unread message

murase...@gmail.com

unread,
Aug 6, 2018, 11:55:35 AM8/6/18
to Pyteomics
Hello,

I found a problem about q-value adjustment for monotonicity enforcement,
when q-values are not monotonic with increasing decoy hits.

In such a case, FDR estimation is not reliable, I think.
Even worse, current implementation of monotonicity enforcement
may hide the problem, especially when q-value curve has local minimum.

Please find attached a small patch to solve this problem.

Thank you,
Masaki Murase





target_decoy.py.diff

Lev Levitsky

unread,
Aug 6, 2018, 4:39:24 PM8/6/18
to pyteomics, murase...@gmail.com
Hi,

Thanks for writing in.
Your patch pertains to the case when q[i - 1] > q[i]. If I'm not mistaken, this is only true if there is a target PSM in position i, which should in turn guarantee that cumsum[i-1] == cumsum[i], making the and clause redundant. If that's not the case and there are cases when the two versions produce different results, then my logic is flawed and there is indeed a problem. Could you suggest an example (list of scores and decoy flags) when the patch makes a difference in output?

Best regards,
Lev







--

---
You received this message because you are subscribed to the Google Groups "Pyteomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pyteomics+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Lev Levitsky
Institute for Energy Problems of Chemical Physics RAS
Laboratory of Physical and Chemical Methods for Structure Analysis
Leninsky pr. 38, bld. 2 119334 Moscow Russia
tel: +7 499 1378257 fax: +7 499 1378257, +7 499 1378258

Lev Levitsky

unread,
Aug 7, 2018, 11:50:04 AM8/7/18
to pyteomics, Murase Masaki
An update after more discussion of the issue: I was not correct about the patch being a no-op. Due to monotonization, it may happen that q[i - 1] > q[i] but cumsum[i-1] == cumsum[i]. I think I understand what you mean now. Indeed, if there is a local minimum on the curve of FDRs, the calculation procedure will monotonize the q-values so that the minimal value is propagated all the way up until FDR goes even lower.
I believe this is in accordance with the adopted definition of q value as "the minimal FDR at which a PSM is accepted" [1]. This behavior is also consistent with our description of the procedure in [2].

The suggested patch, on the other hand, seems to break the monotonicity of q values. Here's an (poorly plotted) example, where each green line is a target PSM and each red one is a decoy, sorted PSMs go from left to right:




[1] Käll, L., Storey, J. D., MacCoss, M. J., & Noble, W. S. (2008). Posterior error probabilities and false discovery rates: Two sides of the same coin. Journal of Proteome Research, 7, 40–44. http://doi.org/10.1021/pr700739d
[2] Levitsky, L. I., Ivanov, M. V., Lobas, A. A., & Gorshkov, M. V. (2017). Unbiased False Discovery Rate Estimation for Shotgun Proteomics Based on the Target-Decoy Approach. Journal of Proteome Research, 16(2), 393–397. http://doi.org/10.1021/acs.jproteome.6b00144

Best regards,
Lev

Masaki Murase

unread,
Aug 8, 2018, 1:41:28 PM8/8/18
to pyteomics
Thank you for your reply and helpful plot.
I also attached example data with permission from my colleague.

You are right my patch is incomplete and break the monotonicity of q values.
I mostly agree with you, but  I still ask you to add a option to calculate q-values 
before the complete monotonization.

Suppose we develop a new decoy database or  novel scoring algorithms, 
I think FDR or q-value without complete monotonization is more helpful to 
check validity and reliability of them.



Thank you,
Masaki Murase

2018-08-08 0:49 GMT+09:00 Lev Levitsky <lev.le...@phystech.edu>:
An update after more discussion of the issue: I was not correct about the patch being a no-op. Due to monotonization, it may happen that q[i - 1] > q[i] but cumsum[i-1] == cumsum[i]. I think I understand what you mean now. Indeed, if there is a local minimum on the curve of FDRs, the calculation procedure will monotonize the q-values so that the minimal value is propagated all the way up until FDR goes even lower.
I believe this is in accordance with the adopted definition of q value as "the minimal FDR at which a PSM is accepted" [1]. This behavior is also consistent with our description of the procedure in [2].

The suggested patch, on the other hand, seems to break the monotonicity of q values. Here's an (poorly plotted) example, where each green line is a target PSM and each red one is a decoy, sorted PSMs go from left to right:



problem_example_data.csv

Lev Levitsky

unread,
Aug 10, 2018, 12:24:25 PM8/10/18
to pyteomics, Masaki Murase
FDRs without monotonization can be calculated using the fdr() function and a for loop, which I used for the plot I showed, but having a vectorized version of fdr() would indeed be handy for this kind of analysis. I'll try refactoring the code we have to expose this functionality.

Masaki Murase

unread,
Aug 10, 2018, 9:50:15 PM8/10/18
to pyteomics
Thank you, Lev.
I'll use a handy version of fdr().

Best regards,
Masaki

2018-08-11 1:23 GMT+09:00 Lev Levitsky <lev.le...@phystech.edu>:
FDRs without monotonization can be calculated using the fdr() function and a for loop, which I used for the plot I showed, but having a vectorized version of fdr() would indeed be handy for this kind of analysis. I'll try refactoring the code we have to expose this functionality.
On Wed, Aug 8, 2018 at 8:39 PM, Masaki Murase <mas...@hautbois.jp> wrote:
Thank you for your reply and helpful plot.
I also attached example data with permission from my colleague.

You are right my patch is incomplete and break the monotonicity of q values.
I mostly agree with you, but  I still ask you to add a option to calculate q-values 
before the complete monotonization.

Suppose we develop a new decoy database or  novel scoring algorithms, 
I think FDR or q-value without complete monotonization is more helpful to 
check validity and reliability of them.


--
You received this message because you are subscribed to a topic in the Google Groups "Pyteomics" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/pyteomics/fUjwaeBdoYM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to pyteomics+unsubscribe@googlegroups.com.
To post to this group, send email to pyte...@googlegroups.com.
Visit this group at https://groups.google.com/group/pyteomics.
To view this discussion on the web visit https://groups.google.com/d/msgid/pyteomics/CAPbZ3gBcpxCyYAoZrKOME%2BsSdF_JhFEqumCJSvnDKOC-GMOdEA%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages