EM order of operations

23 views

Skip to first unread message

Jesse Niehaus

unread,

Dec 20, 2023, 6:07:34 PM12/20/23

to rna-star

Hi Alex,

Related to implementing the EM algorithm, could you please clarify whether this is accurate?

STARsolo assigns reads to cell barcodes prior to gene mapping. Therefore, the EM implementation for multi-gene multimapped reads will be distributed on a 'per-cell' basis (based on uniquely mapped reads in that cell rather than uniquely mapped reads throughout all cells in the sample).
Example: If a cell has uniquely mapped reads in Gene X but not Gene Y, then a multimapped read aligning to both X and Y will get assigned to Gene X but not Gene Y (even if other cells have reads that uniquely map to Gene Y). Hope that makes sense.
I believe Alevin's order is similar to this whereas Kallisto-Bustools aligns first and assigns reads to CBs second, which in our example would result in our read getting distributed to both Gene X and Y based on the sample-wide distribution of reads aligned to X and Y.

And a related question:
Is there a way to keep multi-gene multimapped reads only if there are 'supporting' uniquely mapped reads in the corresponding genes, but throw them out when there aren't any uniquely mapped reads to support it? In the example above, if a cell does not have unique reads in either Gene X or Gene Y, then the multi-mapped reads get thrown out.

Thank you for your time,
Jesse

Alexander Dobin

unread,

Dec 22, 2023, 2:47:54 PM12/22/23

to rna-star

Hi Jesse,

Your description for STARsolo is correct: it performs multi-gene re-distribution for each cell independently of others.

>>>Is there a way to keep multi-gene multimapped reads only if there are 'supporting' uniquely mapped reads in the corresponding genes, but throw them out when there aren't any uniquely mapped reads to support it? In the example above, if a cell does not have unique reads in either Gene X or Gene Y, then the multi-mapped reads get thrown out.

There is no option, but I think you can do it by using the unique-only matrix. For each cell, if a gene has no unique counts, you zero-out it's unique+multi count.

Cheers

Alex

Reply all

Reply to author

Forward

0 new messages