Default Number of mismatches permitted

3,252 views
Skip to first unread message

JC Grenier

unread,
Jun 13, 2014, 11:10:54 AM6/13/14
to rna-...@googlegroups.com
Hi guys, 

I have a general question about the default value for this parameter : outFilterMismatchNmax

Why have you fixed this to 10 by default?

Have you done some tests and figured out it was some optimal value for that parameter. I'm asking because for tophat for example, that value is 2 by default. But
I know that bowtie is not permitting more than 3 mismatches, so that can be a reason. Do you recommend putting that parameter to a more reasonable value, like 2 mismatches?

Thanks a lot. Have a great day!

John Brothers

unread,
Jun 16, 2014, 11:29:57 AM6/16/14
to rna-...@googlegroups.com
I believe Bowtie actually allows for more mistmatches. The -n parameter that defaults to 2 is for the seed region (the first set of highest quality bases in the read being aligned). So downstream in the read, it will allow more mismatches depending on the final score of the alignment. I wanted to clarify that it doesn't allow an absolute minimum of 2 mismatches when you set that. If you want to set a maximum number of mismatches, you use -v, which I believe you can set to any number of mismatches (whereas with -n you are limited to 0-3).

JC Grenier

unread,
Jun 16, 2014, 11:42:01 AM6/16/14
to rna-...@googlegroups.com
Thanks for that precision. I was referring to Tophat more precisely. And you're right, bowtie permits more, but not tophat. It gives us a beautiful error message if we want to put more.

Alexander Dobin

unread,
Jun 16, 2014, 11:48:48 AM6/16/14
to rna-...@googlegroups.com
Hi JC, John,

the default --outFilterMismatchNmax 10 mismatches in STAR is quite arbitrary, and needs to be adjusted according to your particular situation.
A good way to set is with --outFilterMismatchNoverLmax, which is scaled to the read length (you would also need to specify a large number for  --outFilterMismatchNmax since the smaller of the two number will be used).
Both these parameters relate to the total number of mismatches in the paired alignment, which I think makes more sense that setting this threshold for each mate separately.
Of course the mismatches include sequencing errors, SNPs and RNA-editing, that's why I think we need to allow quite a few.
In the current ENCODE production with are using --outFilterMismatchNoverLmax 0.04, which means, for instance, 8 MM for 2x100 pair.

On the other hand, the this parameter is not as important for STAR than it is for TopHat/Bowtie1. Unless you enforce end-to-end alignment, STAR will trim reads whenever the number of mismatches exceeds --outFilterMismatchNmax, and may still be able to map the read. Note that mismatches are not counted in the trimmed ("soft-clipped") portion of the reads.

Cheers
Alex

JC Grenier

unread,
Jun 16, 2014, 12:33:25 PM6/16/14
to rna-...@googlegroups.com
Thanks Alex!

Will you implement an option to deactivate that soft clipping in the future? We analyzed that it could be problematic in some of our analysis.

Cheers,

JC

Alexander Predeus

unread,
Jun 16, 2014, 3:58:55 PM6/16/14
to rna-...@googlegroups.com
I've actually done testing of what is the outcome of changing that parameter. Even in human genome the difference between outFilterMismatchNmax of 2, 6, and 10 was truly minute - less than 1% of total reads overall. 

In mouse genome it was even smaller. 

Of course my testing was not that comprehensive, but I decided to stick with 6 for now and not worry about it. 

Alexander Dobin

unread,
Jun 18, 2014, 6:48:38 PM6/18/14
to rna-...@googlegroups.com
Hi Alex,

6 MM for 2x100 reads or shorter are OK. However, when you go down to 2 MM, I bet the mapped length would be reduced significantly, as STAR will try to meet this requirement by trimming the reads.

Cheers
Alex

Alexander Dobin

unread,
Jun 18, 2014, 6:50:14 PM6/18/14
to rna-...@googlegroups.com
Hi JC,

this option is implemented in the latest versions (since 2.3.1v), you would need to use --alignEndsType EndToEnd

Cheers
Alex

Alexander Predeus

unread,
Jun 19, 2014, 12:51:23 AM6/19/14
to rna-...@googlegroups.com
I see! this should be especially important to remember for aligning human experiments then. Thank you. 

Dessislava Mladenova

unread,
Feb 15, 2015, 9:26:25 PM2/15/15
to rna-...@googlegroups.com

Hi Alex,


Does STAR allow you to set up number of mismatches in a seed region, similarly to bowtie? I want to set 3 MM in a 20-seed bp region and currently I have run STAR with  --outFilterMismatchNoverLmax 0.15 for 2x100bp reads.

Thank you!

Dessi

Alexander Dobin

unread,
Feb 20, 2015, 12:58:51 AM2/20/15
to rna-...@googlegroups.com
Hi Dessi,

STAR does not have a fixed seed area, such as, say, 20b in the beginning of a read.
Seed start positions and lengths are adaptive. If you set --outFilterMismatchNoverLmax 0.15, STAR will allow for 0.15*200=30 mismatches in the paired alignment, which seems on the high side. If you really want to align reads with a high mismatch rate, you will need to reduce --seedSearchLmax to ~20 or even less to allow for denser seeding.

Cheers
Alex

Dessislava Mladenova

unread,
Feb 21, 2015, 2:39:46 AM2/21/15
to rna-...@googlegroups.com
Thanks, Alex!
Reply all
Reply to author
Forward
0 new messages