max mismatch parameter issue

58 views
Skip to first unread message

Federico Ansaloni

unread,
Jul 3, 2018, 1:05:33 PM7/3/18
to rna-star
Hi,

I'm using STAR2.6.0c to map SE 50bp reads against a reference genome. I would like to avoid soft clipping and to allow max 1 mismatch.

I'm using the following command:
STAR   --runThreadN 20   --genomeDir /path/to/dir   --readFilesIn /path/to/dir/sample.fastq.gz      --readFilesCommand
 zcat      --outStd BAM_Unsorted   --outMultimapperOrder Random   --outSAMtype BAM   Unsorted      --outSAMunmapped None      --outSAMprimaryFlag AllBestScore   --outFilterMismat
chNmax 1   --alignEndsType EndToEnd

I'm expecting the nM:i to be nM:i:0 or nM:i:1 but in the output is reported a little amount of reads (~0.01%) with nM:i:2,3,4,5,6,7,8,9.
I report below an example extract from my output file: 

SRR3170296.1133438 0 F21F3.6_transc 805 3 50M * 0 0 CTCAATTTTCGTAGTAATCATTCATCTCCAAAAAAAAAAAAAAGAATAAT BBBFFFFFFFFFFFFFFFIFIIIIIIIIIIIIIIIIIIFFFF77B<<B<B NH:i:2 HI:i:1 AS:i:33 nM:i:8
SRR3170296.4708597 0 Y55F3AM.6a.1_transc 1809 1 50M * 0 0 AAATAAGATCATACTCACTTTTTTTTTCTTTTTTTTTTTTTTGCTTTTTT BBBFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIFFFFF<''0BBFFF NH:i:4 HI:i:2 AS:i:33 nM:i:8
SRR3170296.4708597 0 Y55F3AM.6a.2_transc 1760 1 50M * 0 0 AAATAAGATCATACTCACTTTTTTTTTCTTTTTTTTTTTTTTGCTTTTTT BBBFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIFFFFF<''0BBFFF NH:i:4 HI:i:3 AS:i:33 nM:i:8

Interestingly all these 3 alignments have a cigar of 50M..

What's happening? Should I also change other parameters?

Thanks,
Federico

Alexander Dobin

unread,
Jul 5, 2018, 6:04:56 PM7/5/18
to rna-star
Hi Federico,

most likely this happens because these are multimappers, and the best alignment has only one mismatch but many indels.
For multimappers, only the best alignment is checked for the number of mismatches.
Please extract and post all alignments for these reads.

Cheers
Alex

Federico Ansaloni

unread,
Jul 5, 2018, 7:18:34 PM7/5/18
to rna-star
Thanks for your answer.
If I extract all the alignments for these reads I obtain:
SRR3170296.1133438 0 F21F3.6_transc 805 3 50M * 0 0 CTCAATTTTCGTAGTAATCATTCATCTCCAAAAAAAAAAAAAAGAATAAT BBBFFFFFFFFFFFFFFFIFIIIIIIIIIIIIIIIIIIFFFF77B<<B<B NH:i:2 HI:i:1 AS:i:33 nM:i:8
SRR3170296.1133438 0 F21F3.6_transc 805 3 33M4I13M * 0 0 CTCAATTTTCGTAGTAATCATTCATCTCCAAAAAAAAAAAAAAGAATAAT BBBFFFFFFFFFFFFFFFIFIIIIIIIIIIIIIIIIIIFFFF77B<<B<B NH:i:2 HI:i:2 AS:i:33 nM:i:1
SRR3170296.4708597 0 Y55F3AM.6a.1_transc 1809 1 26M4I20M * 0 0 AAATAAGATCATACTCACTTTTTTTTTCTTTTTTTTTTTTTTGCTTTTTT BBBFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIFFFFF<''0BBFFF NH:i:4 HI:i:1 AS:i:33 nM:i:1
SRR3170296.4708597 0 Y55F3AM.6a.1_transc 1809 1 50M * 0 0 AAATAAGATCATACTCACTTTTTTTTTCTTTTTTTTTTTTTTGCTTTTTT BBBFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIFFFFF<''0BBFFF NH:i:4 HI:i:2 AS:i:33 nM:i:8
SRR3170296.4708597 0 Y55F3AM.6a.2_transc 1760 1 50M * 0 0 AAATAAGATCATACTCACTTTTTTTTTCTTTTTTTTTTTTTTGCTTTTTT BBBFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIFFFFF<''0BBFFF NH:i:4 HI:i:3 AS:i:33 nM:i:8
SRR3170296.4708597 0 Y55F3AM.6a.2_transc 1760 1 26M4I20M * 0 0 AAATAAGATCATACTCACTTTTTTTTTCTTTTTTTTTTTTTTGCTTTTTT BBBFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIFFFFF<''0BBFFF NH:i:4 HI:i:4 AS:i:33 nM:i:1

Looking at the first 2 rows, it seems that the read is mapping against F21F3.6_transc at position 805 in both cases. Only cigar and nM changed. The mapping score is the same so they should be both best alignments, right?
If a read is a multimapper I would like only alignments with max 1 mismatch to be reported. Is there a way to do this?

Thanks,
Federico

Alexander Dobin

unread,
Jul 6, 2018, 10:55:52 AM7/6/18
to rna-star
Hi Federico,

in the current logic, STAR only requires one of the best multimapping alignments to have nM<=outFilterMismatchNmax.
The 33M4I13M alignment satisfies this requirement, so the read passes the filters. Then, all alignments with the score <=bestScore-multScoreRange are reported.
There is no option at the moment to limit mismatches for each multimapping alignment, but you can easily do it after mapping.

Cheers
Alex
Message has been deleted

Federico Ansaloni

unread,
Jul 9, 2018, 6:29:01 AM7/9/18
to rna-star
Thanks for the clarifications. 
I'm sorry to bother you again but looking at these 2 reads I've another doubt..

SRR3170296.1133438 0 F21F3.6_transc 805 3 50M * 0 0 CTCAATTTTCGTAGTAATCATTCATCTCCAAAAAAAAAAAAAAGAATAAT BBBFFFFFFFFFFFFFFFIFIIIIIIIIIIIIIIIIIIFFFF77B<<B<B NH:i:2 HI:i:1 AS:i:33 nM:i:8
SRR3170296.1133438 0 F21F3.6_transc 805 3 33M4I13M * 0 0 CTCAATTTTCGTAGTAATCATTCATCTCCAAAAAAAAAAAAAAGAATAAT BBBFFFFFFFFFFFFFFFIFIIIIIIIIIIIIIIIIIIFFFF77B<<B<B NH:i:2 HI:i:2 AS:i:33 nM:i:1

In my understanding these 2 lines report two different alignments of the same reads (SRR3170296.1133438) against the same transcript (F21F3.6_transc) in the same position (805). It seems that the same read is mapping against the same transcript position in 2 different ways.. What is the meaning of these 2 lines?

Aligning that read against the transcript using clustalw I obtain:

[.....]

SRR3170296.1133438      ------------------------CTCAATTTTCGTAGTAATCATTCATCTCCAAAAAAA

F21F3.6_transc          CCCAAACGTTCGACGAGTTGCTCTCCCAATTTTCGTAGTAATCATTCATCTCCAAAAAAA

                                                * **********************************


SRR3170296.1133438      AAAAAAAGAATAAT----------------------------------------------

F21F3.6_transc          AAAGAATAATTTGAAATACCAAAGTAGATTCCTCACTTCTCAGTCATCTTGTATCCCCCA

                        *** **  * *     

[.....]                                           


Thanks,
Federico

                               

Alexander Dobin

unread,
Jul 13, 2018, 5:06:14 PM7/13/18
to rna-star
Hi Federico,

these alignments indeed are very similar, they start at the same genomic position, however, the 1st one map contiguously (no gaps, CIGAR=50M) with 8MM, while the second one has an insertion of 4 bases (CIGAR=33M4I13M) and one mismatch. It's a coincidence of sorts that these two alignments have equal alignment score AS:i:33, this should happen rarely.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages