Hi Alex,
I've been realigning a number of Tophat aligned bams with Star 2-pass and checking individual sites to optimize STAR parameters. I haven't been able to get Star to align an event we are pretty certain is real and which aligns with TopHat. We were hoping you could help out!
The event is the splicing in of an intron at chr21:47409808- 47409879. Here is a Sashimi plot comparisons between TopHat and Star 2 pass: http://www.broadinstitute.org/~berylc/Star2PassForum/Img1_StarPost.png
We believe the event is real given that the left hand of the inclusion is aligned with both aligners (the sequence of the exons in the region are unique if you look at UCSC) and that the sample carries a splice creating intronic variant 3' of the inclusion.
All IGV screenshots are showing only uniquely aligning reads, filtering out PCR duplicates, secondary and supplementary alignments.
I’ve looked at where reads that are correctly aligned in Tophat are going with Star. A large portion of the reads are getting placed correctly at the 3' end of this inclusion with Star 2 Pass but instead of aligning the rest of the read to the adjacent canonical exon as splicing, the bases are getting softclipped. Here is a shot of this happening (these are all reads that are aligning uniquely to the adjacent exon in Tophat):
http://www.broadinstitute.org/~berylc/Star2PassForum/Img2_StarPost.png
Here’s a comparison of the CIGAR strings in TopHat
and Star2Pass for the same read
TopHat
Alignment start = 47,409,821 (-)
Cigar = 59M292N17M
Star 2 pass
Location = chr21:47,409,829
Alignment start = 47,409,821 (-)
Cigar = 60M16S
(Star aligning one extra base to the splice site and softclipping the rest, the adjacent exon starts with a G)
I’ve tried forcing –alignEndType EndtoEnd. This doesn’t change where the alignment is placed, it just doesn’t softclip the reads. So the CIGAR string becomes 76M (alignment still starting at 47,409,821). Here’s a screenshot of this.
http://www.broadinstitute.org/~berylc/Star2PassForum/Img3_StarPost.png
Another
example of the same thing
TopHat
Alignment start = 47,409,813 (-)
Cigar = 67M292N9M
Star 2 Pass
Alignment start = 47,409,813
Cigar = 68M8S
Here are my general parameters
-76 bp unstranded paired-end reads using genome generated with --sjdbOverhang 75
--outFilterMismatchNoverReadLmax 0.1 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --sjdbScore 1 --sjdbOverhang 75 --outFilterScoreMinOverLread 0.33 --= outFilterMatchNminOverLread 0.33 --alignSoftClipAtReferenceEnds No --chimJunctionOverhangMin 15
I’ve individually made the following changes and ran the aligner each time:
1) outFilterScoreMinOverLread to 0.66 (default)
2) outFilterMatchNminOverLread to 0.66 (default)
3) sjdbScore 0
4) All of the above
5) add --alignEndsType Extend5pOfRead1
6) add --alignEndsType EndToEnd
7) --alignSJDBoverhangMin 2
None of these options align this inclusion on the right hand side.
It seems there are at least two problems. The first is that Star is soft-clipping the reads a little aggressively. However another issue could be that the exon downstream of the inclusion is quite small (26 bps) so reads mapping to the intron inclusion have to map to this exon and the next exon downstream. So these reads are crossing multiple junction. Is Star unforgiving about reads like these? Any comments would help!
Thanks so much!
Best,
Beryl
-76 bp unstranded paired-end reads using genome generated with --sjdbOverhang 76
-76 bp unstranded paired-end reads using genome generated with --sjdbOverhang 76