Cell state as output

125 views
Skip to first unread message

eric....@gmail.com

unread,
Aug 15, 2017, 9:03:34 AM8/15/17
to Perturb-seq
In the documentation, you mention

A Cell state classifier is defined on wildtype or control cells and then applied to all cells in an experiment. These classifications can used as outputs to be predicted (instead of gene expression) or as covariates in the model.

I want to use the Infomap cell states as outputs, but the function run_model relies on a linear model that can't reasonably accommodate categorical response variables. Is there another mimosca function that I should look for?

Thanks!

Atray Dixit

unread,
Aug 15, 2017, 4:52:15 PM8/15/17
to Perturb-seq
Hi Eric,

It should be straightforward to convert categorical response variables into dummy covariates in a form that run_model will be happy about. For example, the pandas get_dummies function can transform a categorical variable to a matrix of 1's and 0's. Let me know if that answers your question.


eric....@gmail.com

unread,
Aug 16, 2017, 10:33:38 AM8/16/17
to Perturb-seq
Hi Atray,
Thanks for the quick response! 

I work with R often, so I am aware of tools to create design matrices from categorical variables. But, it seems questionable to use those design matrices as training output for a linear model. I want to use the Infomap cell states as outputs, so is run_model's Y argument intended to accept values like dummy_matrix ?
Best,
Eric

Atray Dixit

unread,
Aug 16, 2017, 12:51:25 PM8/16/17
to Perturb-seq
Hi Eric,

I see. Yes I see what you mean. If your dependent variable (outputs) are truly binary, then logistic regression is more appropriate than linear regression. In practice, I think you should still get somewhat reasonable results with the run_model as is, you might have to tune the parameters by cross validation. 

Best,
Atray
Reply all
Reply to author
Forward
0 new messages