Hi Alexandra,
thanks a lot for the interesting tests and observations.
1) Soft-clipping issue
This is actually an unintended behavior in my code (a.k.a a bug).
The actual filter for single-end alignments:
the alignment is "too short" if nMatchedBases < int(outFilterMatchNminOverLread*(Lread-1))
For your 1st read, int(outFilterMatchNminOverLread*(Lread-1))=int(0.85*(32-1))=26 which is =26=nMatchedBases, so the alignment passes the filter.
For your 2nd read, the number of matched bases is also 26, since there is one mismatch, and it passes the filter.
For PE reads, the formula is nMatchedBases < int(outFilterMatchNminOverLread*Lread)
This is an unfortunate historical issue in the code. which I do not want to fix presently since it will "silently" change the results and make them incompatible with older versions.
Basically, if you need more stringent alignments, you would have to increase --outFilterMatchNminOverLread slightly.
In general, using controlling the minimum score --outFilterScoreMinOverLread is more robust than controlling the number of matched bases.
For instance, your first read has a lower score than the 2nd (21 vs 24), because it has a very large non-canonical gap.
2) About seeding
3) About winAnchorMultimapNmax
The results with reducing --seedSearchStartLmax and increasing --winAnchorMultimapNmax both make sense showing increase in sensitivity.
Reducing --seedSearchStartLmax allows for more "dense" (in the read sequence) search of seeds.
Increasing --winAnchorMultimapNmax allows more (and shorter) seeds to be tried as "anchors", i.e. explore more genomic loci for potential alignments.
Note that in both cases the mapping speed decreases, this is the main limitation for further gains of sensitivity.
Also, at some point the storage allocated for seeds and windows may need to be increased (see parameters that control it below).
There is yet another parameter that you can try to tweak to increase sensitivity
--seedMultimapNmax (=10000 by default). It defines the max number of loci non-anchor seeds can map to. It will affect catching short overhangs of junctions and indels.
In my hands it never gave a big sensitivity boost, and it could also lead to an increase of false junctions with large gaps.
If you try to increase it, do it on an exponential scale, say by a factor of 10 and 100.
Cheers
Alex
alignWindowsPerReadNmax 10000
int>0: max number of windows per read
alignTranscriptsPerWindowNmax 100
int>0: max number of transcripts per window
alignTranscriptsPerReadNmax 10000
int>0: max number of different alignments per read to consider
seedPerReadNmax 1000
int>0: max number of seeds per read
seedPerWindowNmax 50
int>0: max number of seeds per window
seedNoneLociPerWindow 10
int>0: max number of one seed loci per window