Misalignment of a paired-end read, while correctly aligned with single-read

50 views
Skip to first unread message

Saeed Omidi

unread,
Sep 20, 2017, 1:30:04 PM9/20/17
to rna-star
Hi 

I've recently encountered a case in which STAR correctly aligns paired-end reads when they're aligned in single-end model, in contrast to the paired-end mode which always wrongly aligns the second mate. Here is an example:   

R1: 

@seq1
GAGTGGTAGACAGGTGAGTGACCACAACTCATAATGAAGGGAACAAAAGGTCCCATGCTGAGAGCAGAAGAAGTACCATCCATCGG
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII


 R2:

@seq1
CCGATGGATGGTACTTCTTCTGCTCTCAGCATGGGACCTTTTGTTCCCTTCATTATGAGTTGTGGTCAATCACCTGTCTACCACTCC
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

Note that the R2 is the reverse complement of R1 otherwise they're exactly the same sequence. 


Both mates should be aligned on THADA (NM_001083953) between exon 28 and 29. However, when I align the pair of reads (with STAR's default options) to the hg19, I get always a wrong alignment for the R2 but have the right alignment for the R1. A correct alignment for these two reads is at chr2:43625252 with the CIGAR 27M29960N59M. Instead, what I found that always for the second mate I get a wrong alignment (chr2:43655230) with softclipping (19S68M). In this situation, instead of alining within the junction, STAR wrongly tries to align the second mate through the intron and accepting mismatches and sotclipping. 

Surprisingly, when I run STAR on single-end mode for each of the mates separately, STAR correctly aligns the reads. Does that imply that STAR assumes paired-end reads must not be exactly the same? If it's true, is there a way around it?

Thanks a lot,
Saeed 

Alexander Dobin

unread,
Oct 10, 2017, 5:16:21 PM10/10/17
to rna-star
Hi Saeed,

sorry for belayed reply, this case was a bit complicated.
The PE alignment of this read should look like:

seq1    99      chr2    43398113        255     27M29960N59M    =       43398112        30046   GAGTGGTAGACAGGTGAGTGACCACAACTCATAATGAAGGGAACAAAAGGTCCCATGCTGAGAGCAGAAGAAGTACCATCCATCGG      *   NH:i:1      HI:i:1  AS:i:167        nM:i:3
seq1    147     chr2    43398112        255     28M29960N59M    =       43398113        -30046  GGAGTGGTAGACAGGTGATTGACCACAACTCATAATGAAGGGAACAAAAGGTCCCATGCTGAGAGCAGAAGAAGTACCATCCATCGG   *   NH:i:1      HI:i:1  AS:i:167        nM:i:3

The 2nd mate alignment start 1 base earlier than the 1st mate. By default, such protruding alignments are not allowed by STAR.
There is an option to allow them: --alignEndsProtrude 10 ConcordantPair , where the first value is the number of protruding bases allowed (0 by default) and the second word tells STAR to mark such alignments are concordant or DiscordantPair.
However, there was a bug in implementation of this option, which prevented the output of this alignment.
I have fixed it and now it output the alignment as above. Please check out the GitHub master: https://github.com/alexdobin/STAR/archive/master.zip

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages