Questions about batch correlation

Mauro Masiero

unread,

Feb 24, 2023, 8:49:24 AM2/24/23

to Omics Playground

[2023.02.24 email from Simone M.]

I have a couple of questions about batch correction and bigomics.

To do batch correction, I firstly tried ComBat because it gave me a new adjusted raw table of counts, but after reading some paper I’m not sure this is a good input to use for downstream analysis. Then, I used the removeBatchEffect function in R. The output in this case is obtained from logCPM values. Again, I’m not sure this is a good input to provide in bigomics.

If I decide to do batch correction outside bigomics, what would be the best method to use to obtain a good input for the platform?

In the methods that you propose you use machine learning to adjust data. I’m curious to know, how does the output of this correction looks like?

Message has been deleted

Mauro Masiero

unread,

Feb 24, 2023, 9:53:56 AM2/24/23

to Omics Playground

Thank you for your email!

There is no method of choice for batch correction, our approach is to usually test different methods and check which one performs better. We then use the normalized values as input in Omicsplayground. Since each dataset usually faces a unique set of batch effects, often different methods perform better in different datasets. In Omicsplayground, we normalize the data with counts per million (CPM) and then log transform the data. We use an extra quantile normalization step, which can sometimes remove batch effects, but that's not always the case.

Regarding the inputs for Omicsplayground, we can accept raw or normalized counts in the counts.csv, or log scaled data (logCPM) in expression.csv. For the latter, provide expression.csv instead of counts.csv.

Ivo mentioned some additional (semi) unsupervised batch correction methods, in case you want to check them out:

SVA (surrogate variable analysis), PCA correction, or NNM (nearest neighbour matching). The normalization happens in log scaled data, but the output is converted back to counts (normalized raw values).

Kind regards,

Mauro Masiero, Dr. sci. ETH Zürich

BigOmics Team

Reply all

Reply to author

Forward