negative imputed values

104 views
Skip to first unread message

Christian Schori

unread,
Dec 14, 2022, 8:12:50 AM12/14/22
to MSstats
Dear MSstats Team

We've recently observed negatively imputed intensity values by MSstats in dataProcess > FeatureLevelData > newABUNDANCE. Can you elaborate on the reason for these negative newABUNDANCE values?
Since the newABUNDANCE column should contain log2Intensity values can you also comment on the potential downstream implications of such negative values?
I've attached the MSstats.csv (as .Rda-File) output from FragPipe v.18 (DDA data). - But we've also observed these negative values in newABUNDANCE in data originating from Spectronaut v.16 & 17, FragPipe v. 18 & 19 & DIA-NN v.1.8.1 (DIA data).

Thank you already in advance for looking into this issue.

Best,
Christian

SessionInfo:
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=de_CH.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=de_CH.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=de_CH.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=de_CH.UTF-8 LC_IDENTIFICATION=C      

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base    

other attached packages:
 [1] forcats_0.5.2   stringr_1.5.0   dplyr_1.0.10    purrr_0.3.4     readr_2.1.2     tidyr_1.2.0     tibble_3.1.8    ggplot2_3.3.6   tidyverse_1.3.2 MSstats_4.4.1 
FP18_msstats.rda

Mateusz Staniak

unread,
Dec 14, 2022, 5:29:07 PM12/14/22
to MSstats
Hi,

thanks you for reporting the issue, I will look into this as soon as I can


Kind regards
Mateusz Staniak

Mateusz Staniak

unread,
Dec 20, 2022, 9:19:51 AM12/20/22
to MSstats
Hi,

unfortunately I can't reproduce the problem so far. I'll keep trying with this particular example, but when I try to fit the survival model, memory usage immediately goes to almost 100% and I can't get the output. Do you have another example perhaps?


Kind regards,
Mateusz

Christian Schori

unread,
Dec 20, 2022, 11:15:10 AM12/20/22
to MSstats
Hi Mateusz

I've run the analysis on a machine with 200GB RAM which is maybe the reason why I did not notice this limitation... :-)
Anyways, I've just uploaded the original data for my other issue (regarding NA's in newABUNDANCE). In this dataset you can also find negatively imputed values in newABUNDANCE... - hopefully, this won't need as much RAM.

Best,
Christian

Carlos Gonzalez

unread,
Mar 5, 2024, 6:48:47 PM3/5/24
to MSstats
Hi all,
Just to chime in here and confirm I have also just experienced an issue with negative values imputed. I checked the raw FragPipe output and do not see any negative values (df$ProteiName[which(df$Intensity < 0)). It is definitely imputation as checking the input reveals that the experimental group in question had no values present in the input. But interestingly, for that protein, other experient groups with the same level of missingness did NOT have negative values imputed, just one, so my guess is it's fairly random and maybe has to do with some corner case with model convergence?

Anyway, for now I am just removing them in our pipeline but wanted to confirm the other users experience!

Cheers
Carlos

Anthony Wu

unread,
Mar 6, 2024, 5:07:15 PM3/6/24
to MSstats
Hi Carlos,

To help our team determine the root cause, could you provide us with a sample dataset that is causing MSstats to impute with negative values?

Thanks,
Tony

Jana Z

unread,
Apr 10, 2025, 1:55:11 PMApr 10
to MSstats
Dear MSstats team, 

has there been any resolution to this topic of negative imputed intensities?

I also have observed negative imputed values first using MSstats v4.2.0 when running the dataProcess function with large sample sets and a sizeable number of conditions (e.g. 198 samples, 66 conditions). After switching to MSstats v4.10.1, there were less negative values, but some remained. This seems to affect a small portion of features (e.g. 0.02% of rows in feature table in my case), but can lead to negative protein intensities (0.0015% of rows in protein table) and consequently nonsensical log2FC after runnning the groupComparison function.
There are no negative imputed values when I run the same samples in smallest possible subsets (i.e. 6 samples and 2 conditions). Adding another subset (total 12 samples, 4 consitions) leads again to some negative imputed feature intensities.

Do you have any suggestion how to handle this odd imputation behaviour?

Thank you!
Jana

Devon Kohler

unread,
Apr 14, 2025, 9:30:21 AMApr 14
to MSstats
Hi Jana,

Are you getting any convergence warnings from the dataProcess function? It could be that the imputation algorithm is not converging which would lead to weird imputed values.

In general I do not believe negative imputed values are necessarily a problem because they are on the log scale, but if you are visually seeing weird results and incorrect fold changes then there is probably a deeper issue.

Devon

Reply all
Reply to author
Forward
0 new messages