Alignment scores different for +/- strands?

44 views

Skip to first unread message

Assigned to ado...@gmail.com by me

B Moreland

unread,

Nov 23, 2015, 3:00:55 PM11/23/15

to rna-...@googlegroups.com

Hello,

I have a very large set of reads and, for post-processing reasons, I used STAR to align the reads to the organism's reference genome and then also to the reverse complement of that reference genome (then I should be able to take the "forward" set of alignments from each run to get the complete set). However, when I aligned to the reverse-complement genome and then converted the results to "forward" coordinates, the results didn't match up in a small fraction of cases (and overall, there were slightly fewer uniquely aligned reads and splice junctions). As an example, I isolated some alignments to chr XIII and just ran STAR on those reads to align to chr XIII and chr XIII_minus fasta files. For the following read, these are the two alignments found:

SNXXXX:XXX:8844:15231 272 chrXIII_minus 213483 3 2S12M247348N36M * 0 0 GTGCTCTTCCGACTGACAAGCGCCCATTCTGACCATTAAACTATCACGGA IGIIHJIGJJJIJIHHGHJIJIIIHH@DIJJJJJJIIHHFHFFDFFFCC@ NH:i:2 HI:i:1 AS:i:36 nM:i:0
SNXXXX:XXX:8844:15231 0 chrXIII 463554 3 38M247348N10M2S * 0 0 TCCGTGATAGTTTAATGGTCAGAATGGGCGCTTGTCAGTCGGAAGAGCAC @CCFFFDFFHFHHIIJJJJJJID@HHIIIJIJHGHHIJIJJJGIJHIIGI NH:i:2 HI:i:2 AS:i:40 nM:i:2

(chr XIII is 924431 bp long so in this instance it maps back to the same genomic location.) I understand that given STAR's rules, these two alignments are scored differently, but why wouldn't the reverse-complement equivalent of the bottom alignment appear as an equally-scored alignment? (Or, for that matter, the reverse-complement equivalent of the top). Thus, is the alignment to chrXIII_minus at position 213483 with CIGAR string "2S10M247348N38M" not being found? Or not being scored appropriately? Or might there be a keyword that I should have set? (I only set the keywords: outFilterMultimapNmax=100, outFilterMultimapScoreRange=10)

Any help would be appreciated! If there's anything else I should provide, let me know.
Thank you!

Blythe

Alexander Dobin

unread,

Dec 2, 2015, 5:35:09 PM12/2/15

to rna-star

Hi Blythe,

this problem is caused by a small strand asymmetry in STAR algorithm.

STAR assembles alignments from reads always with respect to the +strand of the chromosomes.

In this case, it finds the canonical junction with two mismatches on chrXIII, and 2nt shifted non-canonical junction with 2 mismatches on chrXIII_minus.

Such problems should affect only a very very small % of reads. You can reduce this number by increasing --scoreStitchSJshift (=1 by default), this parameter controls how hard will STAR look for canonical intron motifs vs non-canonical. Setting this parameter to 2 yields identical alignments on both chromosomes, but for other reads this parameter may have to be increased even more.

At some point in the future I will try to make the algorithm perfectly strand-symmetric.

Cheers

Alex

Reply all

Reply to author

Forward

0 new messages