Hi
I've recently encountered a case in which STAR correctly aligns paired-end reads when they're aligned in single-end model, in contrast to the paired-end mode which always wrongly aligns the second mate. Here is an example:
R1:
@seq1
GAGTGGTAGACAGGTGAGTGACCACAACTCATAATGAAGGGAACAAAAGGTCCCATGCTGAGAGCAGAAGAAGTACCATCCATCGG
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
R2:
@seq1
CCGATGGATGGTACTTCTTCTGCTCTCAGCATGGGACCTTTTGTTCCCTTCATTATGAGTTGTGGTCAATCACCTGTCTACCACTCC
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Note that the R2 is the reverse complement of R1 otherwise they're exactly the same sequence.
Both mates should be aligned on THADA (NM_001083953) between exon 28 and 29. However, when I align the pair of reads (with STAR's default options) to the hg19, I get always a wrong alignment for the R2 but have the right alignment for the R1. A correct alignment for these two reads is at chr2:43625252 with the CIGAR 27M29960N59M. Instead, what I found that always for the second mate I get a wrong alignment (chr2:43655230) with softclipping (19S68M). In this situation, instead of alining within the junction, STAR wrongly tries to align the second mate through the intron and accepting mismatches and sotclipping.
Surprisingly, when I run STAR on single-end mode for each of the mates separately, STAR correctly aligns the reads. Does that imply that STAR assumes paired-end reads must not be exactly the same? If it's true, is there a way around it?
Thanks a lot,
Saeed