I have encountered some questions while using Comet for immunopeptidomics database searching.
I performed two separate search runs. For the first search, the reference database contained approximately 370,000 sequences (denoted as DB1), each around 50 amino acids in length, and I set the parameter search_enzyme_number = 0 to allow unspecific cleavage. For the second search, I used a reference database derived from DB1, consisting of all possible 8–25 amino acid peptides generated by in silico unspecific cleavage of the sequences in DB1 (totaling around 100 million entries), and set search_enzyme_number = 11 for "no cut." All other parameters were identical between the two runs.
The first search completed in about 4 hours, whereas the second took approximately 30 hours. Based on my understanding, the second search essentially manually pre-digests the database in an unspecific manner and should therefore be comparable to the first search in principle. I am thus curious why there is such a significant difference in runtime. Does Comet internally employ search-space reduction algorithms—such as an “optimized sliding window approach”—to improve efficiency when search_enzyme_number = 0 is set?
If I only have access to a pre-digested peptide database, is it feasible to directly use the “no cut” setting? If so, what modifications could be made to reduce computational time? Alternatively, do you have any other suggestions for improving search efficiency under such conditions?
Thank you for your time and support!
Best regards,
Danqing Shen