Can anyone share notepad version of comet.param version 2024

66 views
Skip to first unread message

sudarshan kumar

unread,
Jul 26, 2024, 3:23:18 AMJul 26
to spctools-discuss
Please share notepad version of comet.param version 2024. 

Luis Mendoza

unread,
Jul 26, 2024, 4:00:57 AMJul 26
to spctools...@googlegroups.com
Hello,
You can create a comet parameters file using Petunia.  Simply choose the "Files" menu, then go to the desired directory (or create a new one), and then look for and click on the "New" button at the bottom of the window; you can then choose to create a new file and give it any name you want:

image.png

Hope this helps,
--Luis


On Fri, Jul 26, 2024 at 12:23 AM sudarshan kumar <kumarsu...@gmail.com> wrote:
Please share notepad version of comet.param version 2024. 

--
You received this message because you are subscribed to the Google Groups "spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spctools-discu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spctools-discuss/c944987f-aa59-45fb-82a9-95760ea49213n%40googlegroups.com.

sudarshan kumar

unread,
Jul 26, 2024, 7:47:26 AMJul 26
to spctools...@googlegroups.com
Luis,
Thank you so much. I could do it. 
Best 
Sud



--
-------------------------------------------------------------------
The real voyage of discovery consists not in seeking new lands but seeing with new eyes. — Marcel Proust

Dr. Sudarshan Kumar
(Fulbright-Nehru Fellow)
(B.V.Sc.& A.H., M.V.Sc., PhD.)
Sr. Scientist
Animal Biotechnology Center
(Proteomics and Cell Biology Lab.)
National Dairy Research Institute Karnal, 132001
Haryana, India
Contact No 09254912456
URL www.ndri.res.in

Jimmy Eng

unread,
Jul 26, 2024, 12:44:29 PMJul 26
to spctools-discuss
Just so that you're aware, this can also be downloaded from the Comet website for each release.   Here's the parameters page for the 2024.01 release and you can find the parameters page for all prior releases here.  Every parameter is described and example comet.params files for each release version can be downloaded at the head of each parameters release page.

sudarshan kumar

unread,
Jul 26, 2024, 1:19:17 PMJul 26
to spctools...@googlegroups.com
Thank you so much to both of you Luis and David. It was worth. Otherwise everytime we were used to work with the data class comet file. 

It worked. 

--
You received this message because you are subscribed to the Google Groups "spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spctools-discu...@googlegroups.com.

David Shteynberg

unread,
Jul 26, 2024, 6:43:44 PMJul 26
to spctools-discuss
Hello again Sud,
comet.params
image.png
image.png

sudarshan kumar

unread,
Jul 29, 2024, 7:57:24 AMJul 29
to spctools...@googlegroups.com
Hi David,
Thank you so much for doing an exhaustive study on the data. 
Yes I agree with your keen observation that the data seems more kind of poorly digested peptides. I repeated the analysis with both semi as well as fully. I was getting 0 probability for fully digested searched data from comet (though in comet search there were correct hits). While in semi tryptic search (both peptide prophet and iprophet) returned proteins with good probability. 
I wonder-  in the summary tab I see there are proteins identified to the tune of around 300 but when I look at the protein detail sheet (sorted by PSM number) it shows hardly 10-12 proteins with at least 1 PSM. Rest all entries are with 0 PSM. Why this? Please explain to me. Can you please also explain - when i see the top hit (as per the number of highest PSM), there are more than 150 PSM for it. While the list of identified proteins (with at least one PSM) is very small - hardly 10-12. Why? Though I expected more number of hits with evenly distributed PSM number across the proteins. 

Thank you so much!



On Sat, Jul 27, 2024 at 4:13 AM David Shteynberg <dshte...@systemsbiology.org> wrote:
Hello again Sud,

--
You received this message because you are subscribed to the Google Groups "spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spctools-discu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spctools-discuss/E48C9F54-97ED-4825-AA14-CF84A8956729%40systemsbiology.org.

First, you can find my comet.params file attached.  It is modified to a set of parameters that I selected after having played a bit more with your dataset to try to discover some other reason why you might be getting low number of correct IDs.  One thing I am noticing (after having performed a semi-tryptic search with comet) is that the majority of correct peptide IDs are semi-tryptic.  This is expected among incorrect results, but among correct results this indicates a potential issue with tryptic digestion of the sample.  The model for NTT is learned automatically by PeptideProphet and is pasted here:

--
You received this message because you are subscribed to the Google Groups "spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spctools-discu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spctools-discuss/E48C9F54-97ED-4825-AA14-CF84A8956729%40systemsbiology.org.


I recommend this data is searched without strict tryptic-end requirements on the peptides.

Cheers!
-David


--
You received this message because you are subscribed to the Google Groups "spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spctools-discu...@googlegroups.com.


--
You received this message because you are subscribed to the Google Groups "spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spctools-discu...@googlegroups.com.

sudarshan kumar

unread,
Jul 29, 2024, 8:29:43 AMJul 29
to spctools...@googlegroups.com
Hi David,
Please also explain - Why do I see 0 probability for Butyrophilin (red marked row) while there are 78 PSMs for this. I know this is a correct hit as per the nature of the sample. If it is an issue of poor sample preparation resulting in semi tryptic peptide, I have searched with 1: i. e. semi tryptic peptide. It should be given a statistical score as the correct hit and it should contribute to the correct model. What is the best way to conclude from such data analysis. I cannot run the sample again.
image.png

On Fri, Jul 26, 2024 at 3:43 PM David Shteynberg <dshte...@systemsbiology.org> wrote:
Hello again Sud,

--
You received this message because you are subscribed to the Google Groups "spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spctools-discu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spctools-discuss/E48C9F54-97ED-4825-AA14-CF84A8956729%40systemsbiology.org.

First, you can find my comet.params file attached.  It is modified to a set of parameters that I selected after having played a bit more with your dataset to try to discover some other reason why you might be getting low number of correct IDs.  One thing I am noticing (after having performed a semi-tryptic search with comet) is that the majority of correct peptide IDs are semi-tryptic.  This is expected among incorrect results, but among correct results this indicates a potential issue with tryptic digestion of the sample.  The model for NTT is learned automatically by PeptideProphet and is pasted here:

--
You received this message because you are subscribed to the Google Groups "spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spctools-discu...@googlegroups.com.


I recommend this data is searched without strict tryptic-end requirements on the peptides.

Cheers!
-David

On Jul 26, 2024, at 10:18 AM, sudarshan kumar <kumarsu...@gmail.com> wrote:

--
You received this message because you are subscribed to the Google Groups "spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spctools-discu...@googlegroups.com.


--
You received this message because you are subscribed to the Google Groups "spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spctools-discu...@googlegroups.com.

David Shteynberg

unread,
Jul 29, 2024, 11:02:10 AMJul 29
to spctools-discuss
Hello Sud,

No problem!    

There is a difference between when PeptideProphet reports a probability of “0” for a PSM vs a probability of “0.0000”.   The lone zero “0” is used to represent the case when PeptideProphet model did not find a successful model for a mixture distribution of correct and incorrect result and returned no model as opposed to the model gave a low probability.  So, if all your probabilities come back as “0” it means no model and you have to either adjust analysis model or search parameters or look for another issue with the data, when you see “0.0000” it means the spectrum had a low score based on the model that was returned.

The reason you are seeing many more protein numbers in the PepXML Viewer (Summary Tab) as opposed to after running ProteinProphet is likely because you haven’t applied any threshold filtering to the probability (or other scores). You are seeing all the hits here as opposed to the “likely correct” hits.

Regarding the Butyrophilin, it appears to have several isoforms of which the first one that got the high probability is necessary to explain all the observed peptides for this isoform protein family group, the other proteins in the family share many of the peptides with the tops hit, and come along for the ride, without having independent peptide evidence that would distinguish them from the other isoforms.

Please let me know when you have further questions.

Cheers!
-David 


sudarshan kumar

unread,
Jul 30, 2024, 8:57:20 AMJul 30
to spctools...@googlegroups.com
Hi David,
Thank you for clearing my doubts. 

I have few more queries -
you said "The reason you are seeing many more protein numbers in the PepXML Viewer (Summary Tab) as opposed to after running ProteinProphet is likely because you haven’t applied any threshold filtering to the probability (or other scores). You are seeing all the hits here as opposed to the “likely correct” hits."

I tried to anlayze other run files. It is a blood sample run on orbitrap fusion. The total number of scans are around 88000 (I consider it a high number).
till comet search there are many peptides hits (upto 50000). But as soon as I put stats/models of validation (peptide prophet or iprophet) the number of unique peptides falls down to 300. This drastic reduction in the number of accurate peptides and hence proteins as well force me to think that I am not using correct statistiical models.

I assume that from such a large number of PSM getting only 300 proteins that too iin blood, is unbelievable. 
image.png

original without puttin error filter
image.png






David Shteynberg

unread,
Jul 30, 2024, 10:54:35 AMJul 30
to spctools-discuss
Hello Sud,

Please remember the job of the TPP is to separate the correct results (signal) from the random matches (noise) on the PSM level, peptide level (PTM-level) and protein level.  Generally speaking most datasets have a large portion of incorrect PSMs, which by the nature of statistics will match to random peptides and proteins selected from the search database.  It is standard in our lab that we see large analysis of tens of millions of correct PSMs that map to tens of thousands of the correct peptides and perhaps only a few thousand or so proteins, depending on the sample (e.g. blood plasma.). So I am not concerned about the total numbers of proteins you are seeing after applying statistical cut-offs.  Certainly there are other models you can try in PeptideProphet that might alter your results slightly.  To have a deeper look at your data you have to find where the other correct PSMs might be recovered, e.g. you can try a semi-tryptic (or unconstrained) search, as we did in this thread, if you think the digestion was the issue.  Also, you can search for additional PTMs that are present in the sample but the existing search is missing.  I recommend you focus on the sensitivity of your analysis rather than the absolute total of proteins identified, without consideration for error. The goal of these tools is to give you a user-controlled accurate error rate while maximizing the sensitivity (number of correct identifications out of the total correct identifications possible in the entire analysis.).   One way to apply the TPP is to use the results to improve the laboratory methods to try maximize the return of correct proteins from the samples.  Also replicates, biological and technical are very helpful to help separate the correct signal from random noise.

Hope this helps!

-David

On Jul 30, 2024, at 5:52 AM, sudarshan kumar <kumarsu...@gmail.com> wrote:

Hi David,
Thank you for clearing my doubts. 

I have few more queries -
you said "The reason you are seeing many more protein numbers in the PepXML Viewer (Summary Tab) as opposed to after running ProteinProphet is likely because you haven’t applied any threshold filtering to the probability (or other scores). You are seeing all the hits here as opposed to the “likely correct” hits."

I tried to anlayze other run files. It is a blood sample run on orbitrap fusion. The total number of scans are around 88000 (I consider it a high number).
till comet search there are many peptides hits (upto 50000). But as soon as I put stats/models of validation (peptide prophet or iprophet) the number of unique peptides falls down to 300. This drastic reduction in the number of accurate peptides and hence proteins as well force me to think that I am not using correct statistiical models.

I assume that from such a large number of PSM getting only 300 proteins that too iin blood, is unbelievable. 
<image.png>

original without puttin error filter
<image.png>





sudarshan kumar

unread,
Jul 31, 2024, 2:57:37 AMJul 31
to spctools...@googlegroups.com
Thank you David,
I agree with you. It seems like you are looking at the problem from a statistician point of view. and I am looking at the problem from a biologist point of view. 

Please see this image for Serpin A3-4. There are many observed peptides (PSM for this protein is 157). It is hard to believe that its protein prophet probability is 0.0000. Please see below this image. My dilemma is should I consider this protein or leave it. 

image.png
image.png

image.png
Also please explain why the red marked protein has 0 sequence coverage and 0 probability but 157 PSM. Because I see in the protein details there are many observed peptides. 

Please also suggest what changes I should make in prophet so that many of the spectra which are true dont move to negative distribution (false hits) and instead they fit to the correct model thus returning good probability for identified proteins. 


Best regards,
Sud




sudarshan kumar

unread,
Jul 31, 2024, 4:02:09 AMJul 31
to spctools...@googlegroups.com
Hi David,
I am reading your paragraph again and again to understand it fully and word by word

You said -  I recommend you focus on the sensitivity of your analysis rather than the absolute total of proteins identified, without consideration for error

I usually take the 0.05 error to use the minimum probability cutoff to sort my data. You mean to suggest me that I can go for higher sensitivity. If I see this table - at higher sensitivity also (.8965) I am getting similar (5268) number of correct hits which correspondence to the error rate of .02. Even if i am increasing the sensitivity threshold on my data- the correct hits keeps going down (as per the statistics) which will further reduce the absolute number of correct proteins identified. 

As a researcher I want only the least number of the spectra should be discarded by the prophet. What intrigues me is that - out of 88000 scans/spectra only 5000 are assigned to peptides at an error rate of .05. Do you think this is normal?

image.png

David Shteynberg

unread,
Jul 31, 2024, 11:55:34 AMJul 31
to spctools-discuss
Hello Sud,

ProteinProphet evaluates this from the perspective of Occams Razor.  So the goal is to return the shortest list of proteins that is able to explain all the peptide observations.  When protein A shares all of its peptides with another protein B that also has additional peptides that are not observed in A, protein B is sufficient explain all observed peptides, thus protein B has a high probability and protein A has a low probability.  ProteinProphet will show them as members of a larger group (in your example group 96)  of proteins that share peptides.  This is *not* claiming that protein A with probability 0.000 is not in the sample, it is just not necessary to explain the peptide observations given protein B.  If ProteinProphet were to assign protein A with a high probability it would need to observe a unique peptide not seen in protein B that would distinguish it from B.

To see a graphical representation of the shared and unique peptides mapping to each protein in a ProteinProphet protein group you can click of the protein group number.  This view will shows you with boxes the shared and unique peptides for a given protein group, for example:







To optimize your search further you can try adding PTMs to your search.  You can also adjust the mass tolerances (precursor and fragment) to match your instrument.  

It is prudent to pay close attention to the models returned by the algorithms to understand where the method can improve (e.g. how we identified that the digestion was incomplete in your data.)

-David


David Shteynberg

unread,
Jul 31, 2024, 12:27:27 PMJul 31
to spctools-discuss
Absolutely!  Another thing to understand here is that the statistical analysis happens on several data reduction layers where PeptideProphet works on the level of PSMs, iProphet works of the level of peptide sequences and ProteinProphet works on the level of proteins.  Since these layers stack on top of each other, small errors from incorrect statistics at an earlier layer propagate into much larger errors at the lower levels.  Keeping entrapment decoys in the database allows one to have another evaluation of the error in addition to the TPP models' estimation and provides an FDR estimate that is independent from the TPP.  Unlike "True Positives", independent entrapment decoys are "True Negative" random matches that are not biased by the researcher's prior expectations.  

Also, there maybe a misunderstanding: as you lower the minimum probability threshold the error increases, and the sensitivity also increases (hopefully much faster than the error!)

Cheers!

-David

sudarshan kumar

unread,
Aug 1, 2024, 6:36:27 AMAug 1
to spctools...@googlegroups.com
Hi David 
thank you so much. I learnt a lot from your discussion.

Can you please go through the ppt? I have questions regarding how to include 0 probability assigned isoform of a protein in our list of identified protein. It is not possible if  i use the error rate cutoff criteria to sort the list. 
Case- I am studying tissue proteome. It has more than 30 isoforms of a protein called pregnancy associated glycoprotein. I miss many of them in my list of protiens if i filter the protiens list as per the error rate of .05

sensitivity: error table gives me the last error rate cutoff of .2086. How advisable is it to move down still lower like .1 or .01.

When I lower the min_prob cutoff beyond .2086, it increases the number of identification by including more proteins which belong to the already reported "group of protein" some of which are with high number of PSM. I can see those isoforms also. 

Please give your expert opinion. 







query.pptx

David Shteynberg

unread,
Aug 1, 2024, 4:41:35 PMAug 1
to spctools-discuss
It sounds like you are most interested in the identified peptides rather than the sufficient set of proteins able to explain all of your peptide observations.  The peptides you are seeing map to many proteins, and unfortunately you are not observing peptides that would individually distinguish the protein isoforms.  Unfortunately, this is likely a common problem and there is no one size fits all solution.  Maybe the search parameters tolerances and PTMs can be further optimized? Perhaps the isoforms you are seeking are modified with some PTM that is currently not in your parameter set?  Maybe you can apply a targeted acquisition, going specifically after the peptides that would distinguish the isoforms?  Not really sure, but the TPP data-driven analysis seems to be working ok in providing some clues, given the data.  

Cheers!
-David 

On Aug 1, 2024, at 3:30 AM, sudarshan kumar <kumarsu...@gmail.com> wrote:

Hi David 
thank you so much. I learnt a lot from your discussion.

Can you please go through the ppt? I have questions regarding how to include 0 probability assigned isoform of a protein in our list of identified protein. It is not possible if  i use the error rate cutoff criteria to sort the list. 
Case- I am studying tissue proteome. It has more than 30 isoforms of a protein called pregnancy associated glycoprotein. I miss many of them in my list of protiens if i filter the protiens list as per the error rate of .05

sensitivity: error table gives me the last error rate cutoff of .2086. How advisable is it to move down still lower like .1 or .01.

When I lower the min_prob cutoff beyond .2086, it increases the number of identification by including more proteins which belong to the already reported "group of protein" some of which are with high number of PSM. I can see those isoforms also. 

Please give your expert opinion. 







On Wed, Jul 31, 2024 at 9:27 AM 'David Shteynberg' via spctools-discuss <spctools...@googlegroups.com> wrote:
Absolutely!  Another thing to understand here is that the statistical analysis happens on several data reduction layers where PeptideProphet works on the level of PSMs, iProphet works of the level of peptide sequences and ProteinProphet works on the level of proteins.  Since these layers stack on top of each other, small errors from incorrect statistics at an earlier layer propagate into much larger errors at the lower levels.  Keeping entrapment decoys in the database allows one to have another evaluation of the error in addition to the TPP models' estimation and provides an FDR estimate that is independent from the TPP.  Unlike "True Positives", independent entrapment decoys are "True Negative" random matches that are not biased by the researcher's prior expectations.  

Also, there maybe a misunderstanding: as you lower the minimum probability threshold the error increases, and the sensitivity also increases (hopefully much faster than the error!)

Cheers!

-David
On Wed, Jul 31, 2024, 1:02 AM sudarshan kumar <kumarsu...@gmail.com> wrote:
Hi David,
I am reading your paragraph again and again to understand it fully and word by word

You said -  I recommend you focus on the sensitivity of your analysis rather than the absolute total of proteins identified, without consideration for error

I usually take the 0.05 error to use the minimum probability cutoff to sort my data. You mean to suggest me that I can go for higher sensitivity. If I see this table - at higher sensitivity also (.8965) I am getting similar (5268) number of correct hits which correspondence to the error rate of .02. Even if i am increasing the sensitivity threshold on my data- the correct hits keeps going down (as per the statistics) which will further reduce the absolute number of correct proteins identified. 

As a researcher I want only the least number of the spectra should be discarded by the prophet. What intrigues me is that - out of 88000 scans/spectra only 5000 are assigned to peptides at an error rate of .05. Do you think this is normal?

<image.png>

Luis Mendoza

unread,
Aug 2, 2024, 10:33:34 PMAug 2
to spctools...@googlegroups.com
Hello Sud,

Just wanted to make you aware of the MADCAPS tool in TPP.  It performs sequence alignment and in-silico digestion, which may help you visualize and explore sequences that could definitely identify some of your isoforms.

The quick way to access it is via the "alignment" glyph in ProtXMLViewer (also in other tools):
image.png

This link will open a new page with several sections.  The Digestion tab displays enzyme-digested sequences, along with indications for those that have been observed in your experiment, and whether they are uniquely-mapping, among other information.

In the following example, you can see that even though some observed peptides (red) match the A1 isoform, none of them are uniquely mapping; you can also see that there are several unobserved peptides (green) that are:

image.png

You could then investigate whether any of those uniquely-mapping peptides have been observed by others (for example, in PeptideAtlas), and if there are ways to target those peptides for identification.  Note that you can filter this view in various ways by using the column headings and filter text box.

The Alignment tab gives you a whole sequence view:

image.png

And note that you can change the protein and peptide lists, and digestion parameters in the File and Options tab:

image.png

One word of caution, however: this tool may take a very long time to open for large protein groups, as it will try to align all members.  You may wish to include only certain protein sequences if this is the case.

Hope this helps!
--Luis


sudarshan kumar

unread,
Aug 12, 2024, 7:58:37 AMAug 12
to spctools...@googlegroups.com
Thank you Luis,
I am using it. 

Why is there a difference in the identification of proteins when I was using TPP 6.1 with comet.param 2023 and now when I am using TPP 7.1.0 with comet.param 2024. This difference is huge. Although I am using the same raw file. 

My concern is- Many of the spectra are discarded as useless in the TPP 7.1.0. For example, out of 22000 PSM , only 2000 are matched to peptides at an error rate of .05 while it was more in TPP 6.1. I have tried to keep the settings of peptide prophet same in both versions. 





David Shteynberg

unread,
Aug 12, 2024, 11:19:06 AMAug 12
to spctools...@googlegroups.com
Hello Sud,

Can you please closely compare the parameters for comet 2023 vs comet 2024 that you used to search the data? Using similar parameters should give you similar performance.   Please ascertain that fragment and precursor tolerances are the same.

Meanwhile, I will test this on my computer using your data.

Thanks,
-David

David Shteynberg

unread,
Aug 12, 2024, 12:09:33 PMAug 12
to spctools...@googlegroups.com
Hello again Sud!

I just ran a test comparing combinations of  comet from 6.1 validated with tpp 6.1 and 7.1, and compared to comet from 7.1 validated with 7.1.  What I am seeing is higher sensitivity in the newer versions overall, although changes are incremental.  I still suspect you are using different parameters in your comparison.  Here is the sensitivity table from my analyses:

Comet (from 6.1) + TPP 6.1
image.png

Comet (from 6.1) + TPP 7.1
image.png

Comet (from 7.1) + TPP 7.1
image.png


Cheers!
-David
Reply all
Reply to author
Forward
0 new messages