STAR sensitivity in fusion detection

805 views
Skip to first unread message

Raffaele Calogero

unread,
Apr 18, 2013, 8:05:34 AM4/18/13
to rna-...@googlegroups.com
I found STAR very atttractive specifically for the detection of fusion products.
I tested STAR on a data set made combining the Edgren datasets BT-474  KPL-4  MCF-7  SK-BR-3 (Edgren et al. Genome Biology 2011, 12:R6).
From the paper these are the experimentally validated fusionsare 27: 11 in BT-474, 10 in SK-BR-3, 3 in KPL-4 and 6 in MCF-7.

I use the following line of code to run STAR:

nohup /home/calogero/bin/STAR_2.3.0e/STAR   --runThreadN 40   --genomeDir /home/calogero/bin/genomes/hg19.star/   --readFilesIn ../all_1.fq   ../all_2.fq      --outFileNamePrefix ./output/   --outFilterMismatchNmax 10   --seedSearchStartLmax 30   --chimSegmentMin 15  --chimJunctionOverhangMin 15 &

Then I used the chimera package (I am the maintainer) from bioconductor to annotated the fusions events observed in Chimeric.out.junction file.
I got a total of 55265 fusions. However, before doing any other filters to refine the analysis, I checked how many of the experimentally validated fusions were detected. I found only 2 out of 27. I am a bit disapointed and I would like to know if there is any way to improve sensitivity of the search.
There is any parameter that has to be trimmed to improve fusion detection sensitivity?
Cheers
Raffaele





































Alexander Dobin

unread,
Apr 18, 2013, 12:24:19 PM4/18/13
to rna-...@googlegroups.com
Hi Raffaele,

I have not personally done much work with chimeric junctions beyond checking that I can recover BCR-ABL from K562.
Nicolas Stransky posted in this group and he was trying to use STAR for chimeric detection, he might be able to comment on STAR's sensitivity.
He also found several problems with chimeric output that I think I fixed, so if you could re-run your analysis with the latest patch, it would be great.

Also, I am very interested in checking this issue myself. Could you please point me to the .fastq files for one of the samples (say with the largest number of validated chimeras),
and also to the list of validated chimeric junction coordinates - that would save me some time from reading the whole paper. :)

Cheers
Alex

Raffaele Calogero

unread,
Apr 19, 2013, 1:59:09 AM4/19/13
to rna-...@googlegroups.com

Hi Alex,
many thanks for the quick answer.
Concerning fastq files and cohordinates of fusions, due to security policy of my University I am unable to open an ftp site to the data. However, if you send me (raffaele...@unito.it) an address where to send the data
I will place them in a usb stick and deliver them to you on monday by DHL.

I will run the analysis with the patch and let you know as soon as I finished the analysis
many thanks again
Raf
 

Alexander Dobin

unread,
May 6, 2013, 10:58:57 PM5/6/13
to rna-...@googlegroups.com
Hi Raghu,

we exchange several e-mails with Raffaele on this topic. His latest message was: "From my analysis STAR patched detected 13 our of the 19 fusions that are usually detected by other tools, which is not too bad!"
I will try to play with STAR parameter to further increase sensitivity next week, after the Biology of Genomes meeting.

Cheers
Alex

On Friday, May 3, 2013 6:37:29 PM UTC-4, Raghu Prasad Rao Metpally wrote:
Hi Raffaele and Alex,

Both are interesting and exciting tools.

Did you guys above to solve above issue with the chimera run?

I curious to start the STAR+chimera runs on our RNASeq data sets.

Raffaele , it will be great if you can post the chimera commands you used

Thanks
Raghu


Raghu Prasad Rao Metpally

unread,
May 8, 2013, 12:35:35 PM5/8/13
to rna-...@googlegroups.com
Thanks for the reply ;

Interesting information to know how many fusions it picked up.

I will look forward to your update.

Best,
Raghu

Santosh Anand

unread,
Jan 20, 2014, 6:37:28 AM1/20/14
to rna-...@googlegroups.com
Dear All, 

Any update/s on Chimera analysis? Like which parameters should be tuned for the best results for human data?

Also I'd like to know if the latest alpha release is good to use for the chimera analysis?

Thanks in advance!
santosh 

Santosh Anand

unread,
Jan 22, 2014, 9:45:51 AM1/22/14
to rna-...@googlegroups.com
With many alpha releases (checked with STAR-2.3.1s  STAR-2.3.1v  STAR-2.3.1x), star quits giving following FATAL error when switching on chimera (--chimSegmentMin 20)

--
EXITING because of FATAL ERROR in reads input: short read sequence line: 1
Read Name=@HWUSI-EAS1825_0012_FC:1:33:16212:15893#0/2
Read Sequence====
DEF_readNameLengthMax=50000
DEF_readSeqLengthMax=50000
--

This error is not there with last stable release. I wonder if there is any difference (benefit?) between the last stable release vs. any of the alphas as far as Chimera-analysis is concerned?

Thanks!

Alexander Dobin

unread,
Jan 22, 2014, 3:45:05 PM1/22/14
to rna-...@googlegroups.com
Hi Santosh,

I have tested the latest patches with chimeric output and did not see this problem. 
This looks like a problem with the input file, it's strange that it appears only if you switch on the chimeric detection.
Can you cut a few first reads from your files and send them to me?
The latest patches are generally highly recommended over the 2.3.0e release.

Cheers
Alex

Santosh Anand

unread,
Feb 7, 2014, 2:21:15 PM2/7/14
to
Hi Alex,

Sorry for a late followup. Probably, I  know the problem now. The offending reads are all those which have empty read-entry (they are trimmed for adapter, so this can happen). For example, one of the offending reads is

------read start----
@HWUSI-EAS1825_0012_FC:2:1:15636:1089#0/1

+

------read-end---

This seems to be tolerated by the last stable version of STAR, but the alpha-releases are throwing fatal error (just tried with latest alpha-z-patch too). In my opinion, the maximum it deserves is a warning flag rather than whole program stopping after throwing a fatal error.

Thanks for your time and help!
santosh

Daniel Nicorici

unread,
May 26, 2014, 3:16:36 AM5/26/14
to rna-...@googlegroups.com
Just one quick comment.

The Edgren dataset contains 40 known fusion genes which are RT-PCR validated and not 27!!! You missed 13 of them for some reason!
See here for more info:

Reply all
Reply to author
Forward
0 new messages