Comet DB search very slow in SearchGUI

67 views
Skip to first unread message

Pavel Rehulka

unread,
May 11, 2020, 8:36:19 AM5/11/20
to Comet-ms support
Hi,

recently I have tried to run a simple DB search in SearchGUI-3.3.20 (+ Peptide Shaker 1.16.45) with Comet engine for a raw file from QExactive obtained by an analysis of simple standard protein digest (BSA). I started the search on my laptop (Windows 10, Java 64-bit, ProteoWizard 3.0.20127.d7c32a1ac) and usually in the case of other DB engines the search took 1-2 min (or about 6 min for Andromeda), however the Comet DB search took about 2.5 hours! The difference is huge and I do not know, what I could set up wrongly. I tried to solve this in the SearchGUI forum, and after the discussion with Harald Barsnes https://github.com/compomics/searchgui/issues/240 (solving other issues there as well - thanks Harald!) I was recommended to ask here. I summarized the output from SearchGUI with some log files here:
There are also some screenshots related to the issue with not opening PeptideShaker directly, but I guess these are not related to the problem of slow data processing using Comet on my laptop. Maybe the parameters of my search are incorrect, but I do not know. If necessary, I can provide you with other technical information about the DB search and further software settings.

Thank you for you help!

Best regards,
Pavel

Jimmy Eng

unread,
May 11, 2020, 3:19:32 PM5/11/20
to Comet-ms support
Pavel,

In your search output, I see that Comet 2018.01 version 3 was used.  That version unfortunately included a strlen() protein sequence length call in an inner loop that had a significant detrimental affect on search times.  I encourage you to minimally update to Comet 2018.01 rev. 4 and repeat this search.

Updating Comet will get your search down from 2.5 hours to ~10 minutes (pure guess) which is still slow because Comet handles multiple variable modifications poorly.  I would combine the -17 pyro-glu mod with the same for cysteine as noted below; for me, that little change improved search times by over 20%.  

from:

variable_mod01 = 15.994915 M 0 3 -1 0 0
variable_mod02 = -17.026549 Q 0 1 0 2 0
variable_mod03 = -17.026549 C 0 1 0 2 0
variable_mod04 = -18.010565 E 0 1 0 2 0
variable_mod05 = 42.010565 n 0 1 0 0 0

to:

variable_mod01 = 15.994915 M 0 3 -1 0 0
variable_mod02 = -17.026549 QC 0 1 0 2 0
variable_mod03 = -18.010565 E 0 1 0 2 0
variable_mod04 = 42.010565 n 0 1 0 0 0

Pavel Rehulka

unread,
May 12, 2020, 8:53:34 AM5/12/20
to Comet-ms support
Hi Jimmy,
thank you for your response. I tried to "update" the Comet by the downloading the beta version of SearchGUI 2.0.0 (I found a link to it somewhere in discussion forum), which has a Comet 2019.01 rev. 4. Indeed, a significant improvement in speed was observed. And as you noted, inclusion of varible modifications into the Comet DB search seriously slows down the DB search:
- no var. modifications: 0:00:53
- Met ox.: 0:02:24
- Met ox. + prot N-term Ac + pyroQ: 0:04:48
- Met ox. + prot N-term Ac + pyroQ + pyroE + pyroCamC: 0:14:50
And I do not know why it get so complicated when adding variable modifications, I could understand it when I use e.g. deamidation of N or Q, where every theoretical peptide is on average multiplied many times. But in the case of these modifications that cannot occur so many times (e.g. how many methionines an average protein sequence continues or how many N-terminal protein acetylation may occur). My very naive rough estimate is that the number of theoretical peptides increased probably not more than for 50% of all peptides. What may happen during the search that the time increases 15 times?

Next to it, the report of SearchGUI results processed by PeptideShaker ends up from 3 confident hits in the DB search without any variable modifications over the 3 var. modifications with 3 confident hits to no confident hit (confidence of 85%) after searching with the 5 variable modifications described above. Maybe this is a question for SearchGUI forum, but I am not sure...

PS_07.jpg


Best regards,
Pavel

Dne pondělí 11. května 2020 21:19:32 UTC+2 Jimmy Eng napsal(a):
Reply all
Reply to author
Forward
0 new messages