STAR sensitivity in fusion detection

Raffaele Calogero

unread,

Apr 18, 2013, 8:05:34 AM4/18/13

to rna-...@googlegroups.com

I found STAR very atttractive specifically for the detection of fusion products.
I tested STAR on a data set made combining the Edgren datasets BT-474 KPL-4 MCF-7 SK-BR-3 (Edgren et al. Genome Biology 2011, 12:R6).
From the paper these are the experimentally validated fusionsare 27: 11 in BT-474, 10 in SK-BR-3, 3 in KPL-4 and 6 in MCF-7.

I use the following line of code to run STAR:

nohup /home/calogero/bin/STAR_2.3.0e/STAR --runThreadN 40 --genomeDir /home/calogero/bin/genomes/hg19.star/ --readFilesIn ../all_1.fq ../all_2.fq --outFileNamePrefix ./output/ --outFilterMismatchNmax 10 --seedSearchStartLmax 30 --chimSegmentMin 15 --chimJunctionOverhangMin 15 &

Then I used the chimera package (I am the maintainer) from bioconductor to annotated the fusions events observed in Chimeric.out.junction file.
I got a total of 55265 fusions. However, before doing any other filters to refine the analysis, I checked how many of the experimentally validated fusions were detected. I found only 2 out of 27. I am a bit disapointed and I would like to know if there is any way to improve sensitivity of the search.
There is any parameter that has to be trimmed to improve fusion detection sensitivity?
Cheers
Raffaele

Alexander Dobin

unread,

Apr 18, 2013, 12:24:19 PM4/18/13

to rna-...@googlegroups.com

Hi Raffaele,

I have not personally done much work with chimeric junctions beyond checking that I can recover BCR-ABL from K562.

Nicolas Stransky posted in this group and he was trying to use STAR for chimeric detection, he might be able to comment on STAR's sensitivity.

He also found several problems with chimeric output that I think I fixed, so if you could re-run your analysis with the latest patch, it would be great.

Also, I am very interested in checking this issue myself. Could you please point me to the .fastq files for one of the samples (say with the largest number of validated chimeras),

and also to the list of validated chimeric junction coordinates - that would save me some time from reading the whole paper. :)

Cheers

Alex

Raffaele Calogero

unread,

Apr 19, 2013, 1:59:09 AM4/19/13

to rna-...@googlegroups.com

Hi Alex,

many thanks for the quick answer.
Concerning fastq files and cohordinates of fusions, due to security policy of my University I am unable to open an ftp site to the data. However, if you send me (raffaele...@unito.it) an address where to send the data
I will place them in a usb stick and deliver them to you on monday by DHL.

I will run the analysis with the patch and let you know as soon as I finished the analysis
many thanks again
Raf

Alexander Dobin

unread,

May 6, 2013, 10:58:57 PM5/6/13

to rna-...@googlegroups.com

Hi Raghu,

we exchange several e-mails with Raffaele on this topic. His latest message was: "From my analysis STAR patched detected 13 our of the 19 fusions that are usually detected by other tools, which is not too bad!"
I will try to play with STAR parameter to further increase sensitivity next week, after the Biology of Genomes meeting.

Cheers

Alex

On Friday, May 3, 2013 6:37:29 PM UTC-4, Raghu Prasad Rao Metpally wrote:

Hi Raffaele and Alex,

Both are interesting and exciting tools.

Did you guys above to solve above issue with the chimera run?

I curious to start the STAR+chimera runs on our RNASeq data sets.

Raffaele , it will be great if you can post the chimera commands you used

Thanks
Raghu

Raghu Prasad Rao Metpally

unread,

May 8, 2013, 12:35:35 PM5/8/13

to rna-...@googlegroups.com

Thanks for the reply ;

Interesting information to know how many fusions it picked up.

I will look forward to your update.

Best,
Raghu

Santosh Anand

unread,

Jan 20, 2014, 6:37:28 AM1/20/14

to rna-...@googlegroups.com

Dear All,

Any update/s on Chimera analysis? Like which parameters should be tuned for the best results for human data?

Also I'd like to know if the latest alpha release is good to use for the chimera analysis?

ftp://ftp2.cshl.edu/gingeraslab/tracks/STARrelease/Alpha/STAR_2.3.1x.tgz

Thanks in advance!

santosh

Santosh Anand

unread,

Jan 22, 2014, 9:45:51 AM1/22/14

to rna-...@googlegroups.com

With many alpha releases (checked with STAR-2.3.1s STAR-2.3.1v STAR-2.3.1x), star quits giving following FATAL error when switching on chimera (--chimSegmentMin 20)

--

EXITING because of FATAL ERROR in reads input: short read sequence line: 1

Read Name=@HWUSI-EAS1825_0012_FC:1:33:16212:15893#0/2

Read Sequence====

DEF_readNameLengthMax=50000

DEF_readSeqLengthMax=50000

--

This error is not there with last stable release. I wonder if there is any difference (benefit?) between the last stable release vs. any of the alphas as far as Chimera-analysis is concerned?

Thanks!

Alexander Dobin

unread,

Jan 22, 2014, 3:45:05 PM1/22/14

to rna-...@googlegroups.com

Hi Santosh,

I have tested the latest patches with chimeric output and did not see this problem.

This looks like a problem with the input file, it's strange that it appears only if you switch on the chimeric detection.

Can you cut a few first reads from your files and send them to me?

The latest patches are generally highly recommended over the 2.3.0e release.

Cheers

Alex

Santosh Anand

unread,

Feb 7, 2014, 2:21:15 PM2/7/14

to

Hi Alex,

Sorry for a late followup. Probably, I know the problem now. The offending reads are all those which have empty read-entry (they are trimmed for adapter, so this can happen). For example, one of the offending reads is

------read start----

@HWUSI-EAS1825_0012_FC:2:1:15636:1089#0/1

+

------read-end---

This seems to be tolerated by the last stable version of STAR, but the alpha-releases are throwing fatal error (just tried with latest alpha-z-patch too). In my opinion, the maximum it deserves is a warning flag rather than whole program stopping after throwing a fatal error.

Thanks for your time and help!

santosh

Daniel Nicorici

unread,

May 26, 2014, 3:16:36 AM5/26/14

to rna-...@googlegroups.com

Just one quick comment.

The Edgren dataset contains 40 known fusion genes which are RT-PCR validated and not 27!!! You missed 13 of them for some reason!