Soft Clipping Infrequently Done When Split Read is More Appropriate

132 views
Skip to first unread message

Dario Strbenac

unread,
Sep 21, 2016, 9:00:13 PM9/21/16
to rna-star

Sometimes, STAR will soft-clip a read even if there is substantial overlap of that read on both sides of the splice junction. You can see a comparison between Tophat 2 (mapping done by core facility) and STAR (mapping done by me) in the screenshot.

For example,

Read name = 700666F:126:C8768ANXX:1:2314:7347:13898
Read length = 100bp
----------------------
Mapping = Primary @ MAPQ 255
Reference span = chr10:112,657,777-112,657,832 (-) = 56bp
Cigar = 44S56M
Clipping = Left 44 soft
----------------------
Location = chr10:112,657,763
----------------------
Mate is mapped = yes
Mate start = chr10:112655793 (+)
Insert size = -2039
First in pair
Pair orientation = F2R1
----------------------
NH = 1
HI = 1
nM = 3
AS = 148
-------------------
Alignment start position = chr10:112657777
TGTTTTCAGGCTGGAATAATTTCCAAACAACTCAGAGATCTTTGTCCTTCAAGGGGCAGAAAGCGTTTTGTAAGCGAAGGAGATGGAGGTCGTCTTAAAC

However, if I map it with BLAT, it clearly maps across the splice junction with 0 mismatches. I even used the GENCODE Genes 25 when I created the STAR reference for the best junction alignments. Why is this read being soft-clipped, when other similar ones are being split across the junction?
30588WD_PRELog.out

Alexander Dobin

unread,
Sep 23, 2016, 4:10:17 PM9/23/16
to rna-star
Hi Dario,

if I map this read alone, STAR finds the spliced alignment:
1       0       10      112655793       255     53M1940N47M     *       0       0       TGTTTTCAGGCTGGAATAATTTCCAAACAACTCAGAGATCTTTGTCCTTCAAGGGGCAGAAAGCGTTTTGTAAGCGAAGGAGATGGAGGTCGTCTTAAAC    *       NH:i:1  HI:i:1  AS:i:99 nM:i:0

I think the problem is with the mapping of the second mate. Mu guess it's split inconsistently with this splice. Could you extract the both mates from the BAM file and post them?

Cheers
Alex

Dario Strbenac

unread,
Sep 25, 2016, 10:00:08 PM9/25/16
to rna-star
So you can try the mapping, the pair of reads from the two FASTQ files are:

@700666F:126:C8768ANXX:3:2210:7554:18238 1:N:0:GTCCGC
GTTTAAGACGACCTCCATCTCCTTCGCTTACAAAACGCTTTCTGCCCCTTGAAGGACAAAGATCTCTGAGTTGTTTGGAAATTATTCCAGCCTGAAAACA
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG>F>GGGGGEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCEGGGGGGGGG

@700666F:126:C8768ANXX:3:2210:7554:18238 2:N:0:GTCCGC
GTTTTCAGGCTGGAATAATTTCCAAACAACTCAGAGATCTTTGTCCTTCAAGGGGCAGAAAGCGTTTTGTAAGCGAAGGAGATGGAGGTCGTCTTAAACA
+
BCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGBGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGG

Alexander Dobin

unread,
Sep 28, 2016, 3:59:07 PM9/28/16
to rna-star
Hi Dario,

If I map these reads separately, I get:
1       16      10      112655793       255     53M1940N47M
and
1       0       10      112655794       255     52M1940N47M1S
The 1st mate maps on - strand 1 base upstream (to the left) of the +strand 2nd mate.
Such configurations are not allowed by STAR: the + strand mates has to be left-most. Granted, in this example it's only 1 base shift that makes this alignment invalid.
I am planning to introduce a user-defined allowance for such weird overlaps.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages