Hi,
Thanks for this excellent work and making the data available! I have some questions about maths and computations.
1. Do your linear models include an intercept? Is it set so that intercept equals to the mean of expression of unperturbed cells (cell withouth sgRNAs) or cells with a control guide?
2. I am also really curious in how you determine the probability that a cell has been perturbed. Would you mind explaining this process? If I understood correctly the section in the supplement, the final equation gives the probability that a cell j is perturbed (P(Xj = 1), but could you explain the meaning of the variables? I presume that the Y indicates observed expression level for gene i (and Y-hat the predicted value), Beta is the respective coefficient vector for the gene (column from the coefficient matrix for the main linear model), X0 is the row from the design matrix with 1 replaced by 0 for the corresponding sgRNA tested, is that correct? What about the sigma, could you explain which standard deviation it corresponds to? If you have some example code it would be hugely helpful.
Many thanks in advance,
Iwo