Disable soft-clipping

1,870 views
Skip to first unread message

Neelanjan Mukherjee

unread,
Nov 27, 2013, 9:04:55 AM11/27/13
to rna-...@googlegroups.com
Dear Alex,
I am working with PAR-CLIP data, for which there are T-to-C mismatches that are indicative of Protein-RNA interaction sites. The vast majority existing libraries are short reads 36/50 nt long. I love the speed, the mapping to genome and transcriptome capability of STAR, but the soft-clipping is problematic. I have looked at numerous examples and when the T-to-C mismatch is occurring in the last few positions of the read - the read gets soft-clipped at that position. We use the T-to-C mismatches as a quantitative signal to define binding sites at nucleotide resolution. So...is there any way to disable the soft-clipping? I can provide specific examples if that would help.
Best,
Neel

Alexander Dobin

unread,
Dec 2, 2013, 6:18:34 PM12/2/13
to rna-...@googlegroups.com
Hi Neel,

At the moment there is no good way to prevent soft-clipping in STAR. There were a few requests for this feature, and I am thinking about a solution.
The main problem is as follows. Imagine that you have a 100b read that maps to an annotated splice junction with a short 4b overhang and no mismatches. STAR will only map 96b and  soft-clip 4b. This alignment will have a score of 100-4=96.
Imagine that this sequence can also be mapped to a pseudogene with 3 mismatches, which will have a score of 97-3=94. Since 96-94>1, STAR will only report the first alignment.
On the other hand, if STAR is not allowed to soft-clip, the first alignment may have 4 mismatches and a score of 96-4=92, and thus STAR will report the second alignment.

This exemplifies why soft-clipping is important to avoid bias of alignments towards pseudogenes, even though this problem is not very severe if you are using annotations.

I am going to implement an option to enforce end-to-end alignments.
Another option is to simply replace S operation in CIGAR by extending the M operations at the ends of the reads without changing the local alignment logic - you can actually do it yourself by post-processing STAR files.

Cheers
Alex

Thomas van Gurp

unread,
Jul 22, 2016, 12:15:11 PM7/22/16
to rna-star
Hi Alex,
Did the end-to-end alignment option get implemented?
Cheers,
Thomas

Op maandag 2 december 2013 19:18:34 UTC-4 schreef Alexander Dobin:

Alexander Dobin

unread,
Jul 22, 2016, 12:26:45 PM7/22/16
to rna-star
Hi Thomas,

yes:
--alignEndsType EndToEnd

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages