Guidelines for UMI thresholding for sgRNA attribution to cells

205 views
Skip to first unread message

Paul Klein

unread,
Oct 6, 2022, 12:26:08 PM10/6/22
to Perturb-seq

Hello everyone,

I am very new to perturb-seq, and need your general advice to assign sgRNA guides to a cell.

I ran cellranger count for the alignment, and realized that by default cellranger filters  feature reads depending on the UMI threshold it has computed for a given sgRNA. The UMI filters that are applied by cellranger are very heterogeneous.

I do not find the documentation for the threshold computation but I guess it relies on the number of occurrences the guide found and the UMI distribution to assess a good signal to noise cut-off.

In your experience should I "blindly" rely on the
cellranger algorithm ? Or should I keep all the sgRNA for which a UMI is found in a given cell and compute myself a threshold (by looking at the number of UMIs per sgRNA per cell distribution) ?

Many thanks for your help !
Best,
Paul

Paul Klein

unread,
Oct 13, 2022, 4:18:05 AM10/13/22
to Perturb-seq
Hello everyone,

thanks for your first answers. I am planning to test this notebook: https://github.com/josephreplogle/guide_calling
It is referenced in "Mapping information-rich genotype-phenotype landscapes with genome-scale Perturb-seq" (Cell, 2022).
It computes one threshold per sgRNA relying on the UMIs/sgRNA/cell distribution.

If you have any other suggestions/comments I'll be glad to read them !

Best,
Paul

Martha Liliana Serrano Serrano

unread,
Oct 14, 2022, 4:33:06 AM10/14/22
to Perturb-seq
Hi Paul, 

I wonder if you have considered this approach, or other researchers have any experience or preference? 
Best, 
Martha

Paul Klein

unread,
Oct 21, 2022, 5:48:09 AM10/21/22
to Perturb-seq
Hello Martha,

thank you for your suggestion, I'll definitely have a look at it !

Something that I realized recently as a beginner in perturb-seq analysis was that when using "cellranger count" from 10x for alignment, we get filtered sgRNA per cell. The UMI threshold per sgRNA is computed internally, by fitting a 2 Gaussian mixture model to the log(1+UMI per cell) distribution for each guide, setting the threshold to separate the background from the informative signal.

To access raw guide UMI data, you need to use the "raw_feature_bc_matrix" from the output folder, and make sure you set "gex_only" parameter to False when using scanpy to load the data (https://scanpy.readthedocs.io/en/stable/generated/scanpy.read_10x_mtx.html). By doing this we can get the raw UMI counts per sgRNA per cell and play with different UMI threshold filtering methods :)

Hope this might help some people !

Best,
Paul

Catherine Z

unread,
May 16, 2023, 3:29:35 AM5/16/23
to Perturb-seq
Hi,I find scAR  just directly compare several sgRNA UMI size ...
Reply all
Reply to author
Forward
0 new messages