I found STAR very atttractive specifically for the detection of fusion products.
I tested STAR on a data set made combining the Edgren datasets BT-474 KPL-4 MCF-7 SK-BR-3 (Edgren et al. Genome Biology 2011, 12:R6).
From the paper these are the experimentally validated fusionsare 27: 11 in BT-474, 10 in SK-BR-3, 3 in KPL-4 and 6 in MCF-7.
I use the following line of code to run STAR:
nohup
/home/calogero/bin/STAR_2.3.0e/STAR --runThreadN 40 --genomeDir
/home/calogero/bin/genomes/hg19.star/ --readFilesIn ../all_1.fq
../all_2.fq --outFileNamePrefix ./output/ --outFilterMismatchNmax
10 --seedSearchStartLmax 30 --chimSegmentMin 15
--chimJunctionOverhangMin 15 &
Then I used the chimera
package (I am the maintainer) from bioconductor to annotated the fusions
events observed in Chimeric.out.junction file.
I got a total of 55265
fusions. However,
before doing any other filters to refine the analysis, I checked how many of the experimentally validated fusions
were detected. I found only 2 out of 27. I am a bit disapointed and I
would like to know if there is any way to improve sensitivity of the
search.
There is any parameter that has to be trimmed to improve fusion detection sensitivity?
Cheers
Raffaele