Allele specific mapping strategies with STAR

101 views
Skip to first unread message

Matteo Di Bernardo

unread,
Jan 26, 2023, 8:45:51 AM1/26/23
to rna-star

Hello all,

I am working with diploid cells stemming from a cross of two mouse species with a divergence of a SNP every ~60 bp. We are trying to optimize an allele-specific mapping & featureCounts pipeline for quantifying gene expression at the single-cell level to identify genes that diverge from 50/50 biallelic expression, and to have an accurate counts output from these experiments. As we are working with 10x data (we have both 5' and 3' experiments, all single read, with 5' R1 being antibody capture and 3' R1 cell multiplexing to ID cell types)--we originally began our analysis using Cellranger, but we are concerned with an inability to manipulate parameters that affect mismatch thresholding and subsequently how multimapped reads are handled.

Our strategy (also based on discussions in this group) was to create a merged genome by renaming chromosomes and concatenating (concatenate.sh attached). We attempted mapping using STARsolo for a 3' experiment (just one lane) with the following call:

STAR --runThreadN 16 \ --runMode alignReads \ --genomeDir mm_sp_melded_STAR_output \ --readFilesIn ${FASTQ}L458_898_S2_L001_R2_001.fastq.gz ${FASTQ}L458_898_S2_L001_R1_001.fastq.gz \ --outFilterMismatchNoverReadLmax 0 \ --outSAMtype BAM Unsorted \ --outFilterMultimapNmax 1 \ --soloType Droplet \ --outReadsUnmapped Fastx \ --outMultimapperOrder Random \ --soloCBwhitelist whitelist/3_V3.txt \ --soloUMIlen 12 \ --soloCBlen 16 \ --soloUMIstart 17 \ --readFilesCommand gunzip -c

We focused on setting outFilterMultimapNmax to 1--bar reads from being multimapped, and outFilterMismatchNoverReadLmax to 0 to ensure that reads match perfectly given the similarities in our genome, our reasoning is that if this is above 0, the mapping could see a SNP that differentiates the genome as a potential sequencing error and map to the incorrect read.

As expected there are issues with this, specifically: % of reads unmapped: too short |    48.64%. I'm assuming that this comes from the fact that outFilterMismatchNoverReadLmax is set to 0, and therefore segments that align in a small portion of their read are not able to be mapped given this stringency. Is there any way around this? Are there strategies that people have found to be successful in using STAR to map diploid cells with differing parental references? I'll also attach our Log.final.out file here too. Appreciate the help, thanks in advance,

Matteo

Log.final.out
concatenate.sh

Alexander Dobin

unread,
Feb 6, 2023, 4:03:38 PM2/6/23
to rna-star
Hi Matteo,

I think the no-mismatch requirement is not needed - STAR chooses the best-scoring alignment for the read, so alignments with extra mismatches due to SNPs are discarded.
Reply all
Reply to author
Forward
0 new messages