Problem about the number of detected sgRNA/cell

206 views
Skip to first unread message

orewa...@gmail.com

unread,
Jun 6, 2018, 4:56:30 AM6/6/18
to Perturb-seq
Hi Atray,
I downloaded the processed data from GEO, and tried to convert the GSM2396856_dc_3hr_cbc_gbc_dict_strict.csv file into X matrix with the function mimosca.dict2X that you provided in the Github. After that, I got a very different number of cells with 0 detected sgRNA/cell from the Figure 1.D in the paper (See the jpg attached). 
Is this result correct? or maybe I had applied the function incorrectly?

Thanks for your help!
Dongyang
Histogram.Comparison.jpg

Atray Dixit

unread,
Jun 7, 2018, 1:46:55 PM6/7/18
to Perturb-seq
In general, we used the file with the suffix "lenient" for analysis of single sgRNA effects. The strict file was used to mitigate concerns that the cells we observed to have two or more sgRNAs was an artifact of PCR bias. This was specifically a concern when we looked at interactions. 

orewa...@gmail.com

unread,
Jun 14, 2018, 10:42:57 AM6/14/18
to Perturb-seq
Thanks for the reply. 

I then look into both files (lenient and strict). Both of them have a lot of cells with 0 detected sgRNA, which is different from Fig 1D in the paper. I assume that you remove all the cells with 0 detected sgRNA, and treated all the non-targeting control guides as 0 detected sgRNA. The plots seems closer, but the proportion is slightly different.(as the attached file shows) So I wonder what are those cells with 0 detected sgRNA in Fig 1D?

Did you remove the cells with 0 detected sgRNA from X when you train the linear model? 

Can you share the cell state assignment result on PBMDC 3hr LPS data? In mmc3.xlsx file, you include only 6 states for the model, but there are 7 clusters in mmc2.xlsx and Fig 3C,D. 

Comparison.jpg

Atray Dixit

unread,
Jun 14, 2018, 10:54:25 AM6/14/18
to Perturb-seq
I think you might be on to something. If you look at Supplementary Figure S1A you can see the probability for seeing 0 sgRNAs in the PMDCs is closer to 0.25 as you are getting in your figures for the lenient dictionary, but the probability is closer to that of Figure 1D in K562 TF distribution. I believe the figure is mislabeled as PMDCs, when it is actually the distribution corresponding to K562. 

I'll confirm this, and try and add a corrigendum.

Thanks for checking so closely! 

Atray Dixit

unread,
Jun 14, 2018, 10:59:15 AM6/14/18
to Perturb-seq
I'll double check on cell states later today.

Atray Dixit

unread,
Jun 14, 2018, 11:27:41 AM6/14/18
to Perturb-seq
this should be it
dc3hr_LM_cells_w7.csv

orewa...@gmail.com

unread,
Jun 18, 2018, 2:15:15 AM6/18/18
to Perturb-seq
Thanks Atray!  Fig S1 indeed closer to my lenient barplot.

By the way, I want to do some analysis on the cell states. 
How is the cluster result on the BMDC 3hr un-perturbed data? Will the k-means results be very different from the Infomap result?
Can you share the cell state assignment result on PBMDC 3hr LPS data (perturbed and unperturbed)?

Atray Dixit

unread,
Jun 18, 2018, 11:41:28 AM6/18/18
to Perturb-seq
The results will be slightly different, I think infomap is less prone to some of the common error modes in k-means, such as those associated with variable number of observations/cluster

here is the cell state assignment for the PBMDC 3hr perturbed dataset, the unperturbed might be slightly trickier to dig up (I just used those to train a classifier for the class probabilities in the perturbed dataset attached here, so I'd have to regenerate those and they probably wouldn't line up 1:1)

Hope that helps!
Atray


dc_3hr_classprobs_f.csv

orewa...@gmail.com

unread,
Jun 19, 2018, 4:00:38 AM6/19/18
to Perturb-seq
Thanks for all your reply! These technical details are really helpful.

Best wishes,
Dongyang
Reply all
Reply to author
Forward
0 new messages