STAR can not detect this chimeric read

48 views
Skip to first unread message

Nico

unread,
Aug 26, 2016, 6:51:46 AM8/26/16
to rna-...@googlegroups.com


Hi,

I try to detect chimeric fusions between a virus (integrated into the host genome) and the host genome. I aligned the reads (paired-end 2x76 stranded) on a hybrid genome (host+virus) where the virus genome is considered as an additional chromosome. Here's my command :

$STAR --genomeDir $stargenomeDir --outFilterScoreMinOverLread 0.3 --outFilterMatchNminOverLread 0.3 -seedSearchStartLmax 10 --outFilterMultimapNmax 10 --outFilterMismatchNmax 10 --chimSegmentMin 10 --outFilterMatchNmin 10 --chimJunctionOverhangMin 10 --readFilesIn $r1 $r2 --runThreadN $threads --outStd SAM --readFilesCommand zcat

version : STAR_2.3.1u_r375

So I expect STAR to report fusion reads with minimum 10 bases aligning either of the host or virus ; and the rest on the virus or the host respectivelly. As :

# : host genome
@ : virus genome
= : read
- : splicing

################################################@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
           ==========------------------------------------------------------------------------------------------------=========================
               min 10bp

But when I check on IGV on the extremity of virus genome I observe some reads containing soft-clipping longer than 10bp (in this cases they are 13bp long). When I align these soft-clipped bases on the host genome using blast I found a position on the host genome where I indeed found traces of a fusion transcript which continues (I can clearly see reads that aligned after the fusion breakpoint representing the fusion transcript). But STAR do not report this fusion. Am I doing something wrong ?

I put two figures explaining my cases

Alignment on the virus : https://s15.postimg.org/gsoth8b0r/igv_snapshot2.jpg
Alignment on the host : https://s16.postimg.org/wogfzbi45/igv_snapshot1.jpg

Thanks


Alexander Dobin

unread,
Aug 29, 2016, 1:05:07 PM8/29/16
to rna-star
Hi Nico,

finding a 10b chimeric overhang is very hard, since a 10-mer will map 6000 times on the human genome (assuming sequence randomness)
- so it's very unlikely that you will find a unique match.

Cheers
Alex

Nico

unread,
Aug 29, 2016, 3:08:18 PM8/29/16
to rna-...@googlegroups.com
Hi Alex

That's what I was thinking. And you are right. When I blast I found a lot of possible position I the genome.

Thanks


> edit : I found only 3 possible positions for perfect match in the host genome. I guess chimeric reads should not be multi-mapped to be reported

Reply all
Reply to author
Forward
0 new messages