Interpreting the file - % of reads unmapped: too short?

Skip to first unread message

Alex Chitsazan

Sep 15, 2016, 7:53:41 PM9/15/16
Hey Alex,

We are having trouble with a sample we just sequenced. This is a pilot study where we are trying to make sure our protocol works on a new organism so sample prep is most likely the culprit. However, I'm having trouble understanding our low alignment rate (~50%). After looking at our, we saw that our "% of reads unmapped: too short:" flag was about 48% and thought it was because our reads were short (maybe primer dimers). However after making a histogram of sequence length, that wasn't the case. Can you help explain the a little clearer for me? Specifically, why the too short percentage is so high? I will attach the For some experimental background, the reads come from illumina next-seq paired end 75bp reads.

Thank you very much,


Alexander Dobin

Sep 20, 2016, 5:51:41 PM9/20/16
to rna-star
Hi Alex,

"too short" means that the best alignments STAR found were too short to pass the filters.
This is controlled by --outFilterScoreMinOverLread  --outFilterMatchNminOverLread which by default are set to 0.66. which means that ~2/3 of the total read length (sum of mates) should be mapped.
You can try to reduce these parameters to see how many more reads will be mapped.

One of the possibilities that you have short inserts, but it looks like you have trimmed the reads before mapping?

Also, you can try to map the reads 1 and 2 separately, to see if one of the reads is of poorer quality.

Reply all
Reply to author
0 new messages