Aligning RNA-seq reads to a non-model organism

143 views

Skip to first unread message

Urwah Nawaz

unread,

Apr 23, 2018, 11:32:12 AM4/23/18

to rna-star

Hi ,

First of all, thanks for having forum. I can’t believe I missed it earlier on when I was doing my analysis. Finding this forum already answered a bunch of my questions.

However I have a really specific question, that mainly relates to my project. A while back I used STAR aligner with default parameters to align the RNA-seq reads of a non-model organism to its closest relative with a fully-sequenced genome. Initially this yielded a very low alignment result (a total of 0.91 reads mapped).I then increased the mismatch parameters however did not seem to have a huge impact on the results.

Anyway, after I found this form, I saw that you can change the --outFilterMatchNminOverLread and –outFilterScoreMinOverLread to increase mappability. I used the values 0.10, 0.25 and 0.5, with 0.10 having the highest number of reads mapped. I wanted to know what’s the trade off between setting a lower value for these parameters and getting a higher mapping rate? Obviously my average mapped read length has decreased. I also wanted to know if there is a way I can check which parameter gives me a good quality of results? Is there a way I can check this?

With the 0.10 parameter, I got an overall unique mapping rate of 63.27% and a multi-mapping rate of 32.66%. Both the genome and my non-model organism are known to have a high repeat content so I’m not surprised at the multi-mapping rate, but I wanted to know if this is a good mapping result for a non-model organism aligned to a distant relative?

I've attached the screenshots of both the final Log reports for reference.

Any help on this would be great :)

cheers,

Urwah

0.10_parameter.png

mapping_default.png

Alexander Dobin

unread,

Apr 25, 2018, 6:08:18 PM4/25/18

to rna-star

Hi Urwah,

the trade-off, as you stated, is that you are getting short alignments (~50b mapped length out of 200). The danger is that there may be a lot of misalignments, as only small parts of the reads are mapped.

Also, the mismatch rate is very high at ~8% - is this the divergence you expect between your species and the reference.

Generally, STAR may not work well for error rate > 5%.

To increase sensitivity, I would recommend reducing --seedSearchStartLmax from default 50 to 10-20, and increasing --winAnchorMultimapNmax to 200-500.

One of the simples metrics to use to judge the quality of the alignments as you tweak the mapping parameters is the proportion of unannotated reads - it should decrease with the "good" parameters changes.

Cheers

Alex

Reply all

Reply to author

Forward

0 new messages