Evaluating MSstats Performance on Protein-Level vs PSM-Level Spectronaut Data

Sergio Ciordia

unread,

Nov 19, 2025, 7:51:39 AM11/19/25

to MSstats

Dear MSstats Team,

I am writing because I have a question. I have a DIA experiment exported with Spectronaut (v20), and I would like to compare the MSstats analysis performed at two levels:

Exporting the “MSstats Report” list (PSM level) from Spectronaut. This analysis workflow is the standard one and works perfectly for us.
Exporting the “Protein Quant” list generated by Spectronaut. I know that your standard pipeline starts from PSMs, but I was wondering whether it would be possible to run only the MSstats statistical function at the protein level. We have tried to adapt the list of proteins to match what you obtain with dataProcess, but when attempting to run the statistical function, we get an error saying “the input must come directly from dataProcess”. Would it be possible to run the analysis in some way?

This comparison is only for educational purposes, to see the improvement when quantitative data are processed starting from one level or another.

Thank you very much in advance, and thank you for all your work for the proteomics community.

Best regards,
Sergio

Devon Kohler

unread,

Nov 19, 2025, 4:13:26 PM11/19/25

to MSstats

Hi Sergio,

This is definitely something you can do. You really just need to ensure that you format the protein quant data exactly like the output of the dataProcess function. This means having a list with two values: ProteinLevelData and FeatureLevelData. FeatureLevelData can just be a blank data.frame. ProteinLevelData needs to have the exact columns expected. You can see an example format using the code before!

```

head(SRMRawData)
QuantData<-dataProcess(SRMRawData, use_log_file = FALSE)
head(QuantData$ProteinLevelData)

```

Your code would look something like:

protein_quant_data = list("FeatureLevelData" = data.frame(),

"ProteinLevelData" = data.frame(PUT YOUR DATA HERE))

Hope this helps

Devon

CNB Proteómica

unread,

Nov 21, 2025, 11:25:01 AM11/21/25

to MSstats

Hi Devon,

Thanks to your suggestion we were able to successfully run the script at the protein level using MSstats. To use our abundance table, we had to rename several columns and add these columns with a constant value of 1:

TotalGroupMeasurements, NumMeasuredFeature, MissingPercentage, more50missing, NumImputedFeature

After that, we used the groupComparison function to perform the differential analysis and we didn't get any errors, but we were wondering whether these columns we filled with arbitrary values are actually relevant for the statistical analysis, or if it does not matter what values they have. We just want to make sure the results are valid.

Thank you very much.

Best regards,
Sergio

Devon Kohler

unread,

Nov 21, 2025, 4:43:49 PM11/21/25

to MSstats

Hi Sergio,

Filling these columns with arbitrary values is perfectly fine. They are just used to provide an idea of any potential issues in protein-level summarization. As far as I can tell your results should be perfectly valid.