is it possible to set "no cleavage" and the database fasta file is the target peptides?

Xiaolong Cao

unread,

May 22, 2020, 9:51:33 PM5/22/20

to Comet-ms support

Hello,

If I have some target peptides, is there a setting that I can set "no enzyme" and allows no cleavage in both N and C terminal?

I have this need because I want to search a very large protein database, but the protein sequences are highly similar. I think I can do a virtual digestion and remove some duplicated peptides to increase the searching speed.

Thank you!

Jimmy Eng

unread,

May 22, 2020, 10:13:54 PM5/22/20

to come...@googlegroups.com

The "No_enzyme" option will actually cleave at all residues. I should have named it "cut_everywhere" to be more clear. For no digestion, add your own enzyme rule to the comet.params file such as entry 11 shown below and then select that enzyme for your search using "search_enzyme_number = 11". In the added enzyme definition below, it directs Comet to cleave after a "Z" residue which hopefully doesn't exist in your peptide database which means there will be no cleavage anywhere. When you generate your unique list of peptides, you can either have them be separate, individual FASTA sequence entries and/or you can concatenate the peptides together using an asterisk (*) as a separator between peptides. Comet will interpret an asterisk as a break between peptides. Let me know if you have any follow-up questions. Good luck!

[COMET_ENZYME_INFO]
0.  No_enzyme              0      -           -
1.  Trypsin                1      KR          P
2.  Trypsin/P              1      KR          -
3.  Lys_C                  1      K           P
4.  Lys_N                  0      K           -
5.  Arg_C                  1      R           P
6.  Asp_N                  0      D           -
7.  CNBr                   1      M           -
8.  Glu_C                  1      DE          P
9.  PepsinA                1      FL          P
10. Chymotrypsin           1      FWYL        P

11. No_cut                 1      Z           -

Xiaolong Cao

unread,

May 22, 2020, 10:27:45 PM5/22/20

to Comet-ms support

Thank you so much, Jimmy!

I cannot imagine that I got a solution from you so fast! I will test that.

Much appreciated!

Phillip Wilmarth

unread,

May 23, 2020, 12:05:24 PM5/23/20

to Comet-ms support

Hi All,

I thought I would post an observation about making a non-redundant peptide digest and using that in a search. Assuming you filter the digest to remove redundancy and consider I/L indistinguishable, each peptide sequence is unique and can serve as its own key. The peptide accessions in a peptide FASTA file can be the peptide sequences themselves. You can make lookup tables (such as which proteins contain the peptides) with the peptide sequences as the keys. This does not really have anything to do with the question here. It can be a useful idea to keep in mind for pre- and post-processing of peptide databases.

Cheers,

Phil

Xiaolong Cao

unread,

May 28, 2020, 10:24:27 PM5/28/20

to Comet-ms support

After spending some time testing "comet", I think the "Comet indexed peptide database" works here. This way, comet will build the database, and redundant peptides were combined. If the database file is large, this can save a lot of time. In fact, in my testing run, the CPU*hour consumption is reduced about 90% if using the "Comet indexed peptide database". The memory consumption remains small. The running time is about the same, as it seems that with "Comet indexed peptide database" the multiple thread does not work. I think it is a good way to save CPU*hours.

However, I do have some concern. Running with the proteins as database, or "Comet indexed peptide database", the results are highly similar, but not identical (I the decoy sequences is added manually):

1) "Comet indexed peptide database" seems not showing all protein ids. The number of protein ids assigned to a peptide is much less than use the protein as database. (max_duplicate_proteins = -1, to include all proteins)

2) "Comet indexed peptide database" may have different "Label" for peptides. Some peptides exist in both target database and decoy database, and will be labeled "1" if using protein sequences as database, but may be labeled as "-1" if using "Comet indexed peptide database". The performance of "Comet indexed peptide database" seems to possibly impacted by the order of decoy and target proteins in the database, which means that when building "Comet indexed peptide database", if the target sequences are ahead of decoy sequences, it is more likely to be labeled as "1".

My plan is: maybe I will do the "Label" will my own codes.

Or maybe do two runs, one with only target database and one with only decoy database? But if so, I don't know the proper way of combing the output for percolator. Hope to get some suggestions. Thank you!

Jimmy Eng

unread,

May 29, 2020, 2:39:22 PM5/29/20

to Comet-ms support

Comet's indexed database support was developed for the real-time search (RTS) application. The output of a RTS does return all matched proteins but, as you observe, this functionality was not ported back to the standard Comet outputs (text, pep.xml, Percolator pin). So if you run a regular Comet search against an indexed database, you currently only get a single protein identifier returned. And it should be the first/earliest protein in the FASTA file that is returned.

The notion of a -1 decoy protein label is only relevant for Comet's internal decoys and internal decoys aren't supported in an indexed database search in current released binaries. So I am surprised that you're getting -1 labels for an indexed database search of your own target-decoy FASTA file. I'm not sure how that's possible unless there's a bug in the code as Comet currently doesn't know which sequences within a FASTA file are decoy entries. Anyways, updating the target-decoy labels yourself is your best solution right now. And I would stick with a single run against an indexed database generated from a combined target-decoy FASTA file instead of performing separate target and decoy searches. If you're curious, you should post on the Crux or Percolator forums but I believe the Percolator developers were initially proponents of separate target and decoy searches but have since started supporting combined target-decoy searches (or target-decoy competition in their terms).

And if it's not obvious, if a peptide is present in both a target and decoy sequence, the peptide is question is still "real" and a valid peptide; just because a poor decoy peptide was generated doesn't somehow make a real target sequence questionable in any way. So such peptides should always be labeled as targets. And if this is a pervasive problem, fix how your decoys are being generated. I have recently added Comet internal decoy support for indexed database searches and I will try to update the standard Comet output results to return all matched proteins from an indexed database search ... all for the next release.

Jimmy

Xiaolong Cao

unread,

May 31, 2020, 3:37:55 PM5/31/20

to Comet-ms support

Thanks, Jimmy.

I read the percolator paper and they run the target and decoy search separately. But I think it make sense that the competition of target and decoy in one run may provide more stringent results.

For Crux, I noticed that if a peptide is both in target and decoy, it will be labeled as decoy. Comet will label it as target, which is the right way.

For indexed database, I think it is because that in the param file, sequence with name prefix of “DECOY_” will be treated as decoy. And in most time, only one protein ID was reported, but sometimes more than one will be reported with the indexed database. Hope this point will help you when developing the next release.

Thank you again for your explanation. It is really helpful for me.

Jimmy Eng

unread,

Jun 1, 2020, 6:31:31 PM6/1/20

to Comet-ms support

Thanks Phil. Unrelated to this thread but I wanted to mention that I just found your real-time search data re-analysis and thought it was cool. I'm going to have to back to read it more slowly/thoroughly but wanted to commend you on it!

Reply all

Reply to author

Forward