mismatches allowed in STARsolo

139 views
Skip to first unread message

Matt Wilken

unread,
Nov 23, 2022, 2:32:56 PM11/23/22
to rna-star
Hi, my specific question is whether there is a way to alter the number of allowed mismatches to the reference genome in STARsolo (as you can in STAR)?

CONTEXT BELOW:
I want to map (10X chromium, 90bp reads) single-cell RNA-seq reads to a protein variant library (partially endogenous to human genome).  Each variant has 12 randomized codons (36bp) interspersed/clustered over 190bp.  Each cell should have 1 specific protein variant.

Ultimately, I don't want quantification, I want to recover one contiguous sequence of the protein, encompassing variant regions, per cell in the sequencing library.

I've generated augmented genome fasta/gtf files, however I think I need to modify the number of allowed mismatches during the mapping/alignment to have any hope of mapping reads with so many variable positions.  (Also, potentially pre-filtering the reads to those that do not have perfect mapping to the native human genome?)

I do not see how to change the allowed mismatches given the parameters to STARsolo.  Do you have advice here or suggestion for an entirely different approach?

Thank you very much for your time,
Matt

Alexander Dobin

unread,
Nov 23, 2022, 2:37:37 PM11/23/22
to rna-star
Hi Matt,

the max number of mismatches per read is controlled by
--outFilterMismatchNmax (=10 by default) and --outFilterMismatchNoverLmax (=0.3 by default, normalized to mapped read length).
If the number of mismatches exceeds this threshold, the reads will be soft-clipped. If you want to avoid soft-clipping, you will need --alignEndsType EndToEnd.
However, STAR will not map reads with the number of mismatches exceeding 3-5% of the read length.

Cheers
Alex

Reply all
Reply to author
Forward
0 new messages