[from Marian M. email 1.3.2023 ]
Basically, we performed mRNA sequencing of primary mouse mast cells, WT and with gene-specific knockout. All samples are in quadruplicates and we would like to do differential gene expression analysis for both setups. The samples are labeled 1-4, according to the experiment batch. And from the initial we got PCA, we noticed that our samples somehow cluster based on the experiment ‘batch’. So we would like to correct for this and see if this can improve the data.
I tried to do it in omicsplayground (by doing a supervised correction based on the batch, plus unsupervised correction with PCA) and it seems that this improved our data a lot. Please see attached example. Since we are not sure exactly how the batch correction works, and I am not comfortable with my knowledge on this, we would like to ask how it exactly works and which parameters are best to use without creating too much bias in our data. In addition to the supervised correction, should I add an unsupervised method such as PCA? And what is the difference among the different options on this (PCA, SVA, NNM)? I noticed that if I perform different unsupervised correction, the clustering changes in some ways. I hope you can help me with understanding this function and in making the most out of our analysis. Thank you!
Best regards,
Marian