Alignment scores different for +/- strands?

44 views
Skip to first unread message
Assigned to ado...@gmail.com by me

B Moreland

unread,
Nov 23, 2015, 3:00:55 PM11/23/15
to rna-...@googlegroups.com
Hello,

I have a very large set of reads and, for post-processing reasons, I used STAR to align the reads to the organism's reference genome and then also to the reverse complement of that reference genome (then I should be able to take the "forward" set of alignments from each run to get the complete set). However, when I aligned to the reverse-complement genome and then converted the results to "forward" coordinates, the results didn't match up in a small fraction of cases (and overall, there were slightly fewer uniquely aligned reads and splice junctions). As an example, I isolated some alignments to chr XIII and just ran STAR on those reads to align to chr XIII and chr XIII_minus fasta files. For the following read, these are the two alignments found:

SNXXXX:XXX:8844:15231  272     chrXIII_minus   213483  3       2S12M247348N36M *       0       0       GTGCTCTTCCGACTGACAAGCGCCCATTCTGACCATTAAACTATCACGGA      IGIIHJIGJJJIJIHHGHJIJIIIHH@DIJJJJJJIIHHFHFFDFFFCC@      NH:i:2  HI:i:1  AS:i:36 nM:i:0
SNXXXX:XXX:8844:15231  0       chrXIII 463554  3       38M247348N10M2S *       0       0       TCCGTGATAGTTTAATGGTCAGAATGGGCGCTTGTCAGTCGGAAGAGCAC      @CCFFFDFFHFHHIIJJJJJJID@HHIIIJIJHGHHIJIJJJGIJHIIGI      NH:i:2  HI:i:2  AS:i:40 nM:i:2

(chr XIII is 924431 bp long so in this instance it maps back to the same genomic location.) I understand that given STAR's rules, these two alignments are scored differently, but why wouldn't the reverse-complement equivalent of the bottom alignment appear as an equally-scored alignment? (Or, for that matter, the reverse-complement equivalent of the top). Thus, is the alignment to chrXIII_minus at position 213483 with CIGAR string "2S10M247348N38M" not being found? Or not being scored appropriately? Or might there be a keyword that I should have set? (I only set the keywords: outFilterMultimapNmax=100, outFilterMultimapScoreRange=10)

Any help would be appreciated! If there's anything else I should provide, let me know.
Thank you!

Blythe


Alexander Dobin

unread,
Dec 2, 2015, 5:35:09 PM12/2/15
to rna-star
Hi Blythe,

this problem is caused by a small strand asymmetry in STAR algorithm.
STAR assembles alignments from reads always with respect to the +strand of the chromosomes.
In this case, it finds the canonical junction with two mismatches on chrXIII, and 2nt shifted non-canonical junction with 2 mismatches on chrXIII_minus.
Such problems should affect only a very very small % of reads. You can reduce this number by increasing --scoreStitchSJshift (=1 by default), this parameter controls how hard will STAR look for canonical intron motifs vs non-canonical. Setting this parameter to 2 yields identical alignments on both chromosomes, but for other reads this parameter may have to be increased even more.

At some point in the future I will try to make the algorithm perfectly strand-symmetric.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages