Hi,
I try to detect chimeric fusions between a virus (integrated into the host genome) and the host genome. I aligned the reads (paired-end 2x76 stranded) on a hybrid genome (host+virus) where the virus genome is considered as an additional chromosome. Here's my command :
$STAR --genomeDir $stargenomeDir --outFilterScoreMinOverLread 0.3 --outFilterMatchNminOverLread 0.3 -seedSearchStartLmax 10 --outFilterMultimapNmax 10 --outFilterMismatchNmax 10 --chimSegmentMin 10 --outFilterMatchNmin 10 --chimJunctionOverhangMin 10 --readFilesIn $r1 $r2 --runThreadN $threads --outStd SAM --readFilesCommand zcat
version : STAR_2.3.1u_r375
So I expect STAR to report fusion reads with minimum 10 bases aligning either of the host or virus ; and the rest on the virus or the host respectivelly. As :
# : host genome
@ : virus genome
= : read
- : splicing
################################################@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
==========------------------------------------------------------------------------------------------------=========================
min 10bp
But when I check on IGV on the extremity of virus genome I observe some reads containing soft-clipping longer than 10bp (in this cases they are 13bp long). When I align these soft-clipped bases on the host genome using blast I found a position on the host genome where I indeed found traces of a fusion transcript which continues (I can clearly see reads that aligned after the fusion breakpoint representing the fusion transcript). But STAR do not report this fusion. Am I doing something wrong ?
I put two figures explaining my cases
Alignment on the virus :
https://s15.postimg.org/gsoth8b0r/igv_snapshot2.jpgAlignment on the host :
https://s16.postimg.org/wogfzbi45/igv_snapshot1.jpgThanks