Error in dataProcess step

53 views
Skip to first unread message

Yuen Ping Chong

unread,
Mar 11, 2025, 11:49:34 PMMar 11
to MSstats
Hi, I am new to the MSstats package and was trying to analyze my LCMS/MS results. However I am running into this error as I was trying out with one of the result set. Screenshot 2025-03-12 113740.jpg

Initially I thought that it's because there is only 1 bioreplicate and 1 condition in the excel file, hence I've tried to combine 2 data frames that are from 2 different conditions, and I got this error instead Screenshot 2025-03-12 114029.jpg

May I know how could I rectify this? Thank you

Anthony Wu

unread,
Mar 12, 2025, 5:31:33 PMMar 12
to MSstats
Hi,

Without having the actual dataset on hand, it would be difficult for me to diagnose the issue, but from past experience, it seems that you may not have a unique run ID for each bioreplicate.

If that is not the case, please attach a sample of your dataset and we can better diagnose the issue.

Thanks,
Tony  

Yuen Ping Chong

unread,
Mar 17, 2025, 1:52:41 AMMar 17
to MSstats
Dear Tony, 

I have tried making amendment by merging the raw data from different bioreplicates and runs into one single file, and the screenshot is the error I ran into. I have also attached the log file and the dataset that I used for this trial run.

Appreciate your timely response and for spending time to look into this.

Regards,
YP
trial_combined.csv
MSstats_dataProcess_log_2025_03_17_13_39_44.165393.log
Screenshot 2025-03-17 134551.jpg

Anthony Wu

unread,
Mar 20, 2025, 7:29:22 PMMar 20
to MSstats
Hi,

The issue is that certain peptides that are shared across multiple possible proteins.  For example, for peptide "A.Q(-17.03)DSTSDLIPAPPLSK.V", the following proteins were quantified for run 2: NGAL_HUMAN, B2ZDQ1, and X6R8F3.  

What upstream processing tool do you use for identification / quantification? E.g. MaxQuant, DIANN, Fragpipe, etc.  MSstats has converters as part of the MSstatsConvert package that can convert your quantification report into MSstats format.  These converters filter out any peptides that are shared across multiple proteins.  

Thanks,
Tony

Yuen Ping Chong

unread,
Mar 21, 2025, 2:18:19 AMMar 21
to MSstats
Hi Tony,

As the samples were run by a service provider, they only sent us the results which they processed using PEAKS Studio. And unfortunately I do not have the raw data files hence I can't re-analyze using software that is compatible with the package. 

If it is due to the shared peptides between a few proteins, does it mean that I should select only peptides that are unique to the proteins and remove the non-unique ones? In one of the files I do have a column named "unique" (as shown in the attachment), which I assume that Y meant yes and N meant no, and it is referring to whether the peptide sequence is unique to the proteins. Otherwise, should I manually remove those duplicated peptide sequences prior to using the dataProcess function?

Thank you for your time in this matter.

Sincerely,
YP
Screenshot 2025-03-21 141327.jpg

Anthony Wu

unread,
Mar 21, 2025, 11:06:59 AMMar 21
to MSstats
Yes that is correct, you should only select the peptides that are unique to proteins, i.e. use that column "unique" to filter and if that doesn't work, manually remove those duplicated peptide sequences prior to using the dataProcess function.

We had a feature request in the past to create a PEAKS studio converter.  If you have a sample output from PEAKS studio that you can share, we can look into creating a converter from PEAKS studio to MSstats format. 

Reply all
Reply to author
Forward
0 new messages