newABUNDANCE = NA, despite available ABUNDANCE value

Christian Schori

unread,

Dec 15, 2022, 11:12:42 AM12/15/22

to MSstats

Dear MSstats-Team

I've recently observed NA's in the newABUNDANCE (dataProcess > FeatureLevelData$newABUNDANCE) column which I don't understand. These NA's were observed independent of the "censored" column and independent of values in INTENSITY/ABUNDANCE. Can you please elaborate on why these newABUNDANCE entries are NA and how this affects the intensity roll-up to ProteinLevelData?

I've uploaded the dataProcess output of a SpectroNaut (v. 16) report here. But I've observed this NA's also in the data from Spectronaut 17, DIA-NN 1.8.1, FragPipe DIA/DDA (v. 18 & 19).

Thank you for looking into this.

Best,

Christian

SessionInfo:

R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.1 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=de_CH.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=de_CH.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=de_CH.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=de_CH.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] gridExtra_2.3 vroom_1.5.7 forcats_0.5.2 stringr_1.5.0 dplyr_1.0.10 purrr_0.3.4 readr_2.1.2 tidyr_1.2.0 tibble_3.1.8 ggplot2_3.3.6 tidyverse_1.3.2 MSstats_4.4.1

Mateusz Staniak

unread,

Dec 20, 2022, 7:22:21 AM12/20/22

to MSstats

Hi,

can I see dataProcess input and parameters? This is most likely related to the procedure we apply after normalization: https://github.com/Vitek-Lab/MSstats/blob/master/R/utils_censored.R which treats very small values (depending on quantiles of distribution of ABUNDANCE) as censored missing values, but I want to double check. Transformation that already happened in dataProcess make it unclear for me. Thank you for reporting the possible issue and providing the data

Kind regards

Mateusz

Christian Schori

unread,

Dec 20, 2022, 11:07:51 AM12/20/22

to MSstats

Hi Mateusz

Thank you for looking into this issue. I've actually only used default settings for the whole process... (but I see the NA's in newABUNDANCE also, if I'm only processing the top100 featuresubset.

library(tidyverse)
library(MSstats)

SN_report <- read.csv("~/PUMA/Christian/Spectronaut17_benchmark/SN16/20221206_092357_SN16_ec_spikein_Report.csv")
annotation <- read.csv("~/PUMA/Christian/Spectronaut17_benchmark/SN16/annotation.csv")

SNtoMSstats <- SpectronauttoMSstatsFormat(SN_report, annotation)

SN_dataprocess <- dataProcess(SNtoMSstats)

I've just uploaded the original files [Spectronaut report & annotation file (sorry, they're quite big) ] in case you'd like to reproduce the whole process... You can find the files here.