Aligning RNA-seq reads to a non-model organism

143 views
Skip to first unread message

Urwah Nawaz

unread,
Apr 23, 2018, 11:32:12 AM4/23/18
to rna-star

Hi ,

 

First of all, thanks for having forum. I can’t believe I missed it earlier on when I was doing my analysis. Finding this forum already answered a bunch of my questions.

 

However I have a really specific question, that mainly relates to my project. A while back I used STAR aligner with default parameters to align the RNA-seq reads of a non-model organism to its closest relative with a fully-sequenced genome. Initially this yielded a very low alignment result (a total of  0.91 reads mapped).I then increased the mismatch parameters however did not seem to have a huge impact on the results.

 

Anyway, after I found this form, I saw that you can change the  --outFilterMatchNminOverLread and –outFilterScoreMinOverLread to increase mappability. I used the values 0.10, 0.25 and 0.5, with 0.10 having the highest number of reads mapped. I wanted to know what’s the trade off between setting a lower value for these parameters and getting a higher mapping rate? Obviously my average mapped read length has decreased. I also wanted to know if there is a way I can check which parameter gives me a good quality of results? Is there a way I can check this?

 

With the 0.10 parameter, I got an overall unique mapping rate of 63.27% and a multi-mapping rate of 32.66%. Both the genome and my non-model organism are known to have a high repeat content so I’m not surprised at the multi-mapping rate, but I wanted to know if this is a good mapping result for a non-model organism aligned to a distant relative? 


I've attached the screenshots of both the final Log reports for reference. 


Any help on this would be great :) 



cheers, 

Urwah

0.10_parameter.png
mapping_default.png

Alexander Dobin

unread,
Apr 25, 2018, 6:08:18 PM4/25/18
to rna-star
Hi Urwah,

the trade-off, as you stated, is that you are getting short alignments (~50b mapped length out of 200). The danger is that there may be a lot of misalignments, as only small parts of the reads are mapped.  

Also, the mismatch rate is very high at ~8% - is this the divergence you expect between your species and the reference.
Generally, STAR may not work well for error rate > 5%.
To increase sensitivity, I would recommend reducing --seedSearchStartLmax from default 50 to 10-20, and increasing --winAnchorMultimapNmax to 200-500.

One of the simples metrics to use to judge the quality of the alignments as you tweak the mapping parameters is the proportion of unannotated reads - it should decrease with the "good" parameters changes.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages