Misalignment of a paired-end read, while correctly aligned with single-read

50 views

Skip to first unread message

Saeed Omidi

unread,

Sep 20, 2017, 1:30:04 PM9/20/17

to rna-star

I've recently encountered a case in which STAR correctly aligns paired-end reads when they're aligned in single-end model, in contrast to the paired-end mode which always wrongly aligns the second mate. Here is an example:

R1:

@seq1

GAGTGGTAGACAGGTGAGTGACCACAACTCATAATGAAGGGAACAAAAGGTCCCATGCTGAGAGCAGAAGAAGTACCATCCATCGG

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

R2:

@seq1

CCGATGGATGGTACTTCTTCTGCTCTCAGCATGGGACCTTTTGTTCCCTTCATTATGAGTTGTGGTCAATCACCTGTCTACCACTCC

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

Note that the R2 is the reverse complement of R1 otherwise they're exactly the same sequence.

Both mates should be aligned on THADA (NM_001083953) between exon 28 and 29. However, when I align the pair of reads (with STAR's default options) to the hg19, I get always a wrong alignment for the R2 but have the right alignment for the R1. A correct alignment for these two reads is at chr2:43625252 with the CIGAR 27M29960N59M. Instead, what I found that always for the second mate I get a wrong alignment (chr2:43655230) with softclipping (19S68M). In this situation, instead of alining within the junction, STAR wrongly tries to align the second mate through the intron and accepting mismatches and sotclipping.

Surprisingly, when I run STAR on single-end mode for each of the mates separately, STAR correctly aligns the reads. Does that imply that STAR assumes paired-end reads must not be exactly the same? If it's true, is there a way around it?

Thanks a lot,

Saeed

Alexander Dobin

unread,

Oct 10, 2017, 5:16:21 PM10/10/17

to rna-star

Hi Saeed,

sorry for belayed reply, this case was a bit complicated.

The PE alignment of this read should look like:

seq1 99 chr2 43398113 255 27M29960N59M = 43398112 30046 GAGTGGTAGACAGGTGAGTGACCACAACTCATAATGAAGGGAACAAAAGGTCCCATGCTGAGAGCAGAAGAAGTACCATCCATCGG * NH:i:1 HI:i:1 AS:i:167 nM:i:3

seq1 147 chr2 43398112 255 28M29960N59M = 43398113 -30046 GGAGTGGTAGACAGGTGATTGACCACAACTCATAATGAAGGGAACAAAAGGTCCCATGCTGAGAGCAGAAGAAGTACCATCCATCGG * NH:i:1 HI:i:1 AS:i:167 nM:i:3

The 2nd mate alignment start 1 base earlier than the 1st mate. By default, such protruding alignments are not allowed by STAR.

There is an option to allow them: --alignEndsProtrude 10 ConcordantPair , where the first value is the number of protruding bases allowed (0 by default) and the second word tells STAR to mark such alignments are concordant or DiscordantPair.

However, there was a bug in implementation of this option, which prevented the output of this alignment.

I have fixed it and now it output the alignment as above. Please check out the GitHub master: https://github.com/alexdobin/STAR/archive/master.zip

Cheers

Alex

Reply all

Reply to author

Forward

0 new messages