Issues with processing .d (Bruker) files run in DDA-LFQ mode

64 views
Skip to first unread message

Shagun Gupta

unread,
Aug 13, 2024, 10:50:49 AMAug 13
to spctools-discuss
Hi all

I have been experiencing issues processing .d files obtained from a Bruker timsTOF HT in DDA-LFQ mode, more specifically getting quantification - precursor intensity - per spectra. I am using TPP V6.3.3 Arcus on a windows computer. 

Details
- There are 12 .d files (4 repeats of 3 conditions) composed of a human with yeast proteome spike in at different ratios.
- Converted to .mzXML using msconvert
- Searched with COMET and a search database taken from UniProt for Homo sapiens+Yeast
- Processed with PeptideProphet (filtered at probability associated with 1% FDR), XPRESS, ProteinProphet. Ran with -PREC flag (PeptideProphet), -i flag (XPRESS)
- Want to do hypothesis testing (comparing the three conditions pairwise) using MSstats. So require raw precursor intensity values to make a file that can be used as input to MSstats.

Unfortunately after trying the above, and a few more things, while I get a large number of PSMs passing FDR (~30k), a large proportion of them do not have any precursor intensity value (<1k have some "light area" values). Using the "light area" values also does not give expected results (its a benchmarking dataset and processing with MSFragger gave excellent results that align with expected ratios etc.). Could you suggest things I could be doing differently to get the right results? (I imagine it might have something to do with the initial conversion to mzXML itself?) 

Happy to share any other details needed!

Best
Shagun

David Shteynberg

unread,
Aug 13, 2024, 11:05:42 AMAug 13
to spctools...@googlegroups.com
Hello Shagun,

Thank you for the detailed report.  If you are able, please first compress (into a zip or similar) and then share some of the problem .d files so I can try to replicate this issue on my computer before I offer any suggestions.  

Cheers!
-David

--
You received this message because you are subscribed to the Google Groups "spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spctools-discu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spctools-discuss/cf666974-5c1f-4de0-80ad-d9b1df2173dan%40googlegroups.com.

Shagun Gupta

unread,
Aug 13, 2024, 12:47:46 PMAug 13
to spctools-discuss
Hi David

I have attached a link to the following files 
- one replicate per condition .d file (file starting with "C_" had IDs but no values that could be extracted for quantification)
- comet parameter file I used
- fasta file I ran the search with.

Let me know if I can share anything else!

Best
Shagun

David Shteynberg

unread,
Aug 13, 2024, 3:09:37 PMAug 13
to spctools...@googlegroups.com
Hello Shagun,

I downloaded the larger zip file twice and tried to decompress, but each time it told me there was a corruption in the zip file.  The smaller files downloaded fine.  Can you check your file and upload again?

Thanks!
-David

Shagun Gupta

unread,
Aug 13, 2024, 5:15:58 PMAug 13
to spctools-discuss
Hi David

Apologies for the hassle! I have uploaded the .d files in an unzipped format under TPP_diagnosis/TPP_diagnosis. Let me know if this works instead?

Best
Shagun

David Shteynberg

unread,
Aug 13, 2024, 6:30:45 PMAug 13
to spctools...@googlegroups.com
Hello Shagun,

Thank you!  I was able to download your data this time and convert it to mzML files.
Internally, we use tdf2mzml.py to convert the .d directories into mzML files for the TPP for DIA data, (diapysef for DDA data.)   Here is an example of the command we use on our linux computer:


tdf2mzml.py --ms1_type centroid --compression zlib -i A_500ng_DDA_HM10_100_1p9_25minGT_Slot2-1_1_4647.d -o A_500ng_DDA_HM10_100_1p9_25minGT_Slot2-1_1_4647.mzML

Would you be willing to try using that converter to generate the mzML files before you search with Comet+TPP?

Thanks!
-David

David Shteynberg

unread,
Aug 14, 2024, 11:47:31 AMAug 14
to spctools-discuss
Hello Shagun,

I was able to run comet and the TPP on these files from the latest TPP version 7.1.0.  After adding two independent deBruijn randomized decoys to the database using the Petunia Decoy Database tool, my pipeline found approximately 80 thousand to almost 90 thousand PSMs at 1% spectrum error rate, per mzML file.  These map to about 68 thousand unique peptides at 1% peptide error rate, after iProphet combining all the files.  ProteinProphet mapped all of these to just over 7000 proteins at 1% protein error rate (or lower.)

The biggest issue I found in your comet params file was the was n-terminal acetylation was specified using MSFragger notation ‘[^’ for the amino acid (should be ’n’ instead for comet.)  Unfortunately this confused the pipeline and exposed downstream assumptions that broke the analysis. 

I am attaching the comet params file I used to process this data.  

Cheers!
-David

P.S.  Please update your TPP to the latest 7.1.0 to get the latest features and bug-fixes!

ShagunGupta.comet.params

Shagun Gupta

unread,
Aug 15, 2024, 1:34:26 PMAug 15
to spctools-discuss
Hi David

Thank you for your prompt response. I’ll try out your suggestions today and will update you on the results or if I encounter any issues.

Quick question—are there any specific features in the latest version of TPP that make it particularly well-suited for DDA-LFQ analysis on Bruker instruments?


Best
Shagun

David Shteynberg

unread,
Aug 15, 2024, 3:07:37 PMAug 15
to spctools...@googlegroups.com
Hello Shagun,

I think there is nothing specific in the TPP that makes it more or less suited for a particular instrument, manufacturer or experimental setup, however, it is important to give the tools good parameters to expect good results, assuming good data.

As far as new features in the TPP 7.0.0 (and beyond,)  StPeter label-free quantitation that computes spectral indices has new features in TPP 7 that allow it to separate the protein quantities in one ProteinProphet resulting protXML file, "by run" or "by experiment" which helps the user compare protein quantities across runs or across experiments.  We hope this is useful.

Cheers!

-David

--
You received this message because you are subscribed to the Google Groups "spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spctools-discu...@googlegroups.com.

sudarshan kumar

unread,
Aug 16, 2024, 1:55:03 AMAug 16
to spctools...@googlegroups.com
I agree with this. This is a great new feature in the latest version (7.1.0). In earlier versions we had to colaculate nSIN for every file separately in that case we needed to take the average of SIN score separately for biological replicates and then compare among the groups. 

Its a good feature. 

Best



--
-------------------------------------------------------------------
The real voyage of discovery consists not in seeking new lands but seeing with new eyes. — Marcel Proust

Dr. Sudarshan Kumar
(Fulbright-Nehru Fellow)
(B.V.Sc.& A.H., M.V.Sc., PhD.)
Sr. Scientist
Animal Biotechnology Center
(Proteomics and Cell Biology Lab.)
National Dairy Research Institute Karnal, 132001
Haryana, India
Contact No 09254912456
URL www.ndri.res.in

Reply all
Reply to author
Forward
0 new messages