MOCCASIN correction

12 views
Skip to first unread message

Théa Mmn

unread,
Nov 17, 2025, 8:17:49 AMNov 17
to Biociphers

Hi everyone,

I'm using MOCCASIN prior to running MAJIQ in order to correct for covariates (RIN, sequencing batch, ..). Although MOCCASIN reports that the correction was applied successfully, I’m still seeing residual structure in my data.

I’ve attached UMAPs before and after correction, and while there is some improvement, the separation by batch/covariate is still clearly visible.

Has anyone encountered something similar? 

Any advice or shared experience would be greatly appreciated!


afterMOC_RIN.png

beforeMOC_RIN.png

bsl...@seas.upenn.edu

unread,
Nov 18, 2025, 11:08:57 AMNov 18
to Biociphers

Dear Thea,


If you have not already, please see the MOCCASIN documentation, which includes advice for building your model matrix and remarks on RIN specifically:

https://biociphers.bitbucket.io/majiq-docs/getting-started-guide/moccasin.html


In brief: moccasin learns a linear model of confounding variation and then sets the confounders to zero. For RIN, this means you probably want to use 10-RIN rather than original RIN so that “zero” means high-integrity, not degraded RNA. Another option is to bin the RIN values into three or four categories and use category 1/0 indicators in the model matrix. In any case, I recommend you include an intercept column, and if you bin RIN into categories, then (similar to batches) the left-out category column will be the one represented by the intercept, and hence the one the model adjusts to. Please see the docs linked above for more information about this.


Going further: since MOCCASIN applies a linear adjustment, it will only remove variation which can be linearly modeled out. To remove more variation, an option is to include more confounding factors in the model matrix. This can include unknown confounders or nonlinear functions of known confounders, for example.


For analysis, it’s possible to quantify the variation explained by the confounders by calculating model residuals with and without the confounding variables. This is one way of quantifying “how much” confounding variation MOCCASIN removed. (Although, while it’s possible to do this today, it’s not the most user-friendly to do.)


Please let me know if you have additional questions.


Best Regards,

Barry

Reply all
Reply to author
Forward
0 new messages