Hello,
I have been using STAR for aligning CLIP-seq reads. It works fairly well and the sensitivity is excellent and more reads are mapped than for using Tophat for the same sample. However, I am trying to optimise my parameters to fix a problem I have noticed.
The problem is that CLIP reads are short (~30-50 nt) and frequently contain a small deletion (1-3 nt). I have observed that if this deletion is near either end of the read, it is not handled correctly and the read is soft-clipped (if in Local mode) or just mapped with mismatches (if in EndToEnd mode). Reducing the --scoreDelOpen parameter to -1, 0.1 or 0 made almost no difference.
Are there other parameters I can alter that might fix this?
The only other idea I can think of is to perform some sort of local realignment after the mapping stage, like people do for Indel variant detection... I don't know whether this is appropriate.
As an example, in the screenshot below, the tracks are Tophat2 (top), STAR default (2nd from top), STAR EndToEnd (3rd), and STAR EndToEnd and ScoreDelOpen 0 (bottom track).
You can see there are a selection of 1 nt deletions in the right of the top track (tophat) that aren't present in the STAR tracks. Also there are softclippings in the 2nd track (shown as longs stretches of mismatched bases, eg, far left) that are mapped as mismatches in the 3rd and 4th track, rather than as matches with a small deletion.
