publications on logCPM transformation of MS intensities

39 views
Skip to first unread message

Ivo Kwee

unread,
Jul 26, 2022, 8:11:57 AM7/26/22
to Omics Playground
Sorry for the late question. Is there any peer-reviewed publications explaining the logic behind the logCPM transformation of mass spectrometry intensities ?

[originally from Felipe da Veiga Leprevost @ github 27july22]

Ivo Kwee

unread,
Jul 26, 2022, 8:12:50 AM7/26/22
to Omics Playground

BigOmics Analytics Team

unread,
Jul 27, 2022, 4:41:05 AM7/27/22
to Omics Playground
Felipe: "I'm familiar with most of those publications. However, I don't recall reading about applying statistical transformations designed for discrete distributions on the relative ion abundances. Please correct me if I'm wrong. I would understand if you apply them to spectral counting or peptide counting, but the intensities originate from completely different technologies with different proprieties."

BigOmics Analytics Team

unread,
Jul 27, 2022, 5:14:05 AM7/27/22
to Omics Playground
OK. So, you are making a difference between spectral counting and intensities (incl LFQ)? Up to now, we mostly handled only proteomics data from LS MS/MS as intensities (or LFQ intensities). 

Yes,  for discrete distributions, the logCPM normalizes the sum of the counts to a million, then takes the logarithm. On continuous signals, like the intensities, "logCPM" would scale the intensities to "million intensity units", then take the logarithm, which is equivalent to taking the logarithm and shifting all values with some constant. Maybe it would be better to call it logPMI  ("log per million intensity") but the calculation would still be the same. 

Are you having doubt on the "CPM" step, or the "logarithm" step for continous intensities? 

Ivo

Felipe da Veiga Leprevost

unread,
Jul 27, 2022, 10:56:31 AM7/27/22
to Omics Playground
Doubt would be a strong word. I have been working with proteomics data analysis for some year now, and I never heard about this approach. As customary in the scientific field, when a new analysis method is introduced, a white paper or a peer-reviewed paper usually follows it. I understand that this might be a well-known method to handle RNA-Seq (count) data, but the proteomics community has been using different approaches, and there some reports of methods borrowed from the Genomics field under-representing.
Reply all
Reply to author
Forward
0 new messages