Hi Smitha,
if I understand it correctly, are you talking about the "Soft-clip" operation "S" in the CIGAR string?
STAR performs the so called local alignment of the read sequence to the genome, as opposed to the end-to-end (semi-global) alignment which is performed by many DNA aligners such as bowtie1.
This means that STAR will try to maximize the alignment score by "extending" the alignment towards the end of the reads. However, it will not try to force the "full-length" read alignment from the first to the last base of the read sequence. The score is a sum of +1 for matches and -1 for mismatches.
Consider the following example:
A A A A C A A A C C A C A read
| | | | X | | | X X | X |
A A A A A A A A A A A A A genome
+ + + + - + + + - - + - + +/-1 for matches/mismatches
1 2 3 4 3 4 5 6 5 4 5 4 5 alignment score
^max score
8M5S, nM=1
STAR will "extend" the alignment to include 8 first bases (where the alignment score reaches maximum), and will "Soft-clip" the remaining 5 bases.
The alignment length is 8, and the number of mismatches is 1 which considers just those in the aligned portion of the read.
So if you see something like 20S, it means that at least 10 bases out of 20 are mismatched.
This could be caused by many reasons: poor sequencing quality tails, adapter sequence, A-tails, 3'-end RNA modifications.
However, the most likely reason is that STAR could not find a good splicing accommodation for those 20b.
That's why I would not recommend counting the mismatches in the S-clipped portions of the read.
Cheers
Alex