how to treat standard-normalized data (aka. relative expression as percentage)

24 views
Skip to first unread message

Ivo Kwee

unread,
Nov 7, 2024, 5:48:06 AM11/7/24
to Omics Playground
[email from C.Moritz 4 Nov. 2024]

1) I like to work with standard-normalized instead of quantile-normalized data.
As a consequence of standard normalization, the average of each sample is zero. Half of the data are above zero, the other half is below zero. However, BigOmics showed the the error message that it can't handle negative values and sets all to zero, which might bias my data. Is there a simple way to include even negative data. I was thinking about doing a log-transformaion after standard normalization.

Ivo Kwee

unread,
Nov 7, 2024, 6:12:27 AM11/7/24
to Omics Playground
Hi C.Moritz,

Looking at your data, you are most probably dealing with "normalized relative proportion" data from multiplexed protein abundances. The range of your values are [-0.9,98.9] indicating you are most probably dealing with percentage value but they have been shifted to negative values because of your normalization. Mostly proportion data (%) are scaled so that each feature sum up to 100% across samples. The scale is still linear and absolute quantification of the protein is not anymore possible. This kind of data can be analyzed using Omics Playground but you need some extra care in your case. 

The problem in your case is that linear scale data should not have negative values, but your normalization to zero mean (by column per sample) has shifted the values towards -1. Also notice that zero mean centering (columnwise) does put the average to zero but not (as you suggested) put half above/below zero. That is what median (not mean) centering is doing. In fact, median centering (maxMedian) is the standard normalization we do for proteomics.

OPG expects linear counts without negative values, or log2 transformed values. Again, you seem to have linear data with negative values. The best would be to upload your not-normalized relative expression data with values strictly between [0,100]. 

Another trick could be to just add +1 to your data to make all values positive. If you add +1, Omics Playground (OPG) should detect the values are linear and internally we will use the log2 transformed values for statistical testing.

Hope this helps,

Ivo Kwee
BigOmics Team





cm-histogram.png
Reply all
Reply to author
Forward
0 new messages