I have 75 bp, single-end, good quality reads from Mouse genome generated by Smart-seq2 protocol. I tried STAR for aligning them using the default parameters which results in following statistics: Unique alignment 63.35%, Reads mapped to multiple loci: 8.89% and reads unmapped: too short: 25.62%.
I then followed on from previous posts and changed values of the following parameters to this: --seedSearchStartLmax 30 --outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0 --outFilterMatchNmin 30
but the unique alignment increased only to about 68.16%, Reads mapped to multiple loci: 12.85% and reads unmapped: too short:16.52%
In a previous study, where I had problems with a similar % of reads unmapped "too short", I used the above mentioned parameters, the results showed increased unique alignment from 62% to 85% and only about 2-5% increase in multi loci. The only difference was that data was paired-end reads compared to this dataset which is single-end reads.
I have checked the quality of reads and they are all good reads. In both datasets, there is 3' adapter contamination in about 10-30% of the dataset, which I didn't remove in either of the study. I suppose STAR's soft clipping takes care of those reads. (I had checked and about 50-70% of the contaminated reads aligned uniquely in the paired-end dataset after using the above parameters).
Can you please suggest what should I consider to improve my alignment for the single-end reads?
Thank you,
Saumya