STAR fusion detected less fusions with custom human reference

348 views
Skip to first unread message

Vineela Gangalapudi

unread,
Apr 17, 2018, 9:30:34 AM4/17/18
to STAR-Fusion
Hi,

I ran STAR-fusion by downloading the prebuilt CTAT library from https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/ for GRCH37_v19 and 185 fusion genes were predicted. I am trying to add STAR_fusion to my genomics pipeline and hence wanted to build STAR-fusion index for hg19 reference that is already being used in our pipeline. I followed the steps in fusionfilter and ran the following steps

./prep_genome_lib.pl --genome_fa ucsc.hg19.fasta --gtf ucsc.hg19_star.gtf --pfam_db ./Pfam-A.hmm --CPU $SLURM_CPUS_PER_TASK --output_dir new

./prep_genome_lib.pl --genome_fa ucsc.hg19.fasta --gtf ucsc.hg19_star.gtf --fusion_annot_lib ./new --annot_filter_rule ./new/AnnotFilterRule.pm --pfam_db PFAM.domtblout.dat.gz --CPU $SLURM_CPUS_PER_TASK --output_dir star_fusion_index

After running STAR-fusion only 54 fusions genes were predicted for the same sample. I don't understand the reason for such huge change. I used the same STAR mapping parameters specified on your github page and generated chimeric.out.junction file to run STAR-fusion.

I looked into the pipeliner.cmds that is generated while running prep_genome_lib.pl. repeatmasking did not happen and directly blastn started. Could this be the reason for such huge change.?
 
In this link https://github.com/FusionFilter/FusionFilter/blob/master/bhaas.Broad.build.notes I can see that repeatmasker and blastn is run within the prep.genome.lib.pl. But only blastn ran in my situation.

Could you please point me in the right direction.

Thanks.

Brian Haas

unread,
Apr 17, 2018, 9:44:41 AM4/17/18
to Vineela Gangalapudi, STAR-Fusion
Hi,

Are you comparing results from different versions of STAR-Fusion?   The latest release (v1.3) works a bit differently and incorporates an expression threshold in filtering the results, so you should get less cruft (ie. false positives) in the result set - and fewer overall results being reported.  There's also some annotation-based filters to remove 'red herrings'.

best,

~brian


--
You received this message because you are subscribed to the Google Groups "STAR-Fusion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to star-fusion+unsubscribe@googlegroups.com.
To post to this group, send email to star-...@googlegroups.com.
Visit this group at https://groups.google.com/group/star-fusion.
To view this discussion on the web visit https://groups.google.com/d/msgid/star-fusion/efbbdad3-952b-4fcd-8974-02df9050cbcb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

Vineela Gangalapudi

unread,
Apr 17, 2018, 9:51:58 AM4/17/18
to STAR-Fusion
I am using the same version of STAR-fusion "STAR-Fusion-v1.2.0" , and hg19 genome. The only difference between both my runs is the STAR fusion-indexes. In the first case, I used the CTAT indexes, while in the second case I built STAR-fusion indexes myself, following the fusionfilter steps.

Do I have to run repeat masker before I run fusionfilter?


On Tuesday, April 17, 2018 at 9:44:41 AM UTC-4, Brian Haas wrote:
Hi,

Are you comparing results from different versions of STAR-Fusion?   The latest release (v1.3) works a bit differently and incorporates an expression threshold in filtering the results, so you should get less cruft (ie. false positives) in the result set - and fewer overall results being reported.  There's also some annotation-based filters to remove 'red herrings'.

best,

~brian

On Tue, Apr 17, 2018 at 9:30 AM, Vineela Gangalapudi <vineelaga...@gmail.com> wrote:
Hi,

I ran STAR-fusion by downloading the prebuilt CTAT library from https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/ for GRCH37_v19 and 185 fusion genes were predicted. I am trying to add STAR_fusion to my genomics pipeline and hence wanted to build STAR-fusion index for hg19 reference that is already being used in our pipeline. I followed the steps in fusionfilter and ran the following steps

./prep_genome_lib.pl --genome_fa ucsc.hg19.fasta --gtf ucsc.hg19_star.gtf --pfam_db ./Pfam-A.hmm --CPU $SLURM_CPUS_PER_TASK --output_dir new

./prep_genome_lib.pl --genome_fa ucsc.hg19.fasta --gtf ucsc.hg19_star.gtf --fusion_annot_lib ./new --annot_filter_rule ./new/AnnotFilterRule.pm --pfam_db PFAM.domtblout.dat.gz --CPU $SLURM_CPUS_PER_TASK --output_dir star_fusion_index

After running STAR-fusion only 54 fusions genes were predicted for the same sample. I don't understand the reason for such huge change. I used the same STAR mapping parameters specified on your github page and generated chimeric.out.junction file to run STAR-fusion.

I looked into the pipeliner.cmds that is generated while running prep_genome_lib.pl. repeatmasking did not happen and directly blastn started. Could this be the reason for such huge change.?
 
In this link https://github.com/FusionFilter/FusionFilter/blob/master/bhaas.Broad.build.notes I can see that repeatmasker and blastn is run within the prep.genome.lib.pl. But only blastn ran in my situation.

Could you please point me in the right direction.

Thanks.

--
You received this message because you are subscribed to the Google Groups "STAR-Fusion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to star-fusion...@googlegroups.com.

Brian Haas

unread,
Apr 17, 2018, 9:55:53 AM4/17/18
to Vineela Gangalapudi, STAR-Fusion
I see.

The CTAT genome lib build for v1.2 and earlier did involve a repeat-masking step.    This was all changed in v1.3.

You could try to build your own for v1.2 but you'd need to follow the instructions that are specific to that distribution.   Everything is versioned on github (code and wiki documentation) so you might be able to figure something out there.  I don't think I was so good about ensuring the various documentation bundles were in each of the submodules.  I'll check that so it will happen going forward.

I'd suggest just upgrading and use the latest documentation for everything.

Apologies for any pain here.

best,

~b

To unsubscribe from this group and stop receiving emails from it, send an email to star-fusion+unsubscribe@googlegroups.com.

To post to this group, send email to star-...@googlegroups.com.
Visit this group at https://groups.google.com/group/star-fusion.

For more options, visit https://groups.google.com/d/optout.

Brian Haas

unread,
Apr 17, 2018, 9:59:11 AM4/17/18
to Vineela Gangalapudi, STAR-Fusion
You might very well find a wiki directory in the corresponding FusionFilter/ directory, which should have matched documentation.

best,

~brian

Vineela Gangalapudi

unread,
Apr 17, 2018, 10:01:56 AM4/17/18
to Brian Haas, STAR-Fusion
This is Perfect. I found the directory, Thanks a lot for the prompt response. I will rebuild the index and run STAR-fusion.


Brian Haas

unread,
Apr 17, 2018, 10:02:34 AM4/17/18
to Vineela Gangalapudi, STAR-Fusion
OK, best of luck.

Try to upgrade as soon as you can, though.  

~b

Vineela Gangalapudi

unread,
Apr 17, 2018, 10:04:18 AM4/17/18
to Brian Haas, STAR-Fusion
Yup, doing it right now

Brian Haas

unread,
Apr 17, 2018, 10:04:36 AM4/17/18
to Vineela Gangalapudi, STAR-Fusion
:-)

--
You received this message because you are subscribed to the Google Groups "STAR-Fusion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to star-fusion+unsubscribe@googlegroups.com.
To post to this group, send email to star-...@googlegroups.com.
Visit this group at https://groups.google.com/group/star-fusion.

For more options, visit https://groups.google.com/d/optout.

Vineela Gangalapudi

unread,
Apr 24, 2018, 9:12:29 AM4/24/18
to STAR-Fusion
Hi,

I upgraded to STAR-fusionv1.3.1
followed the wiki document in FusionFilter.wiki directory and did the following steps

1. RepeatMasker -pa $SLURM_CPUS_PER_TASK  -s -species human -xsmall cDNA_seqs.fa

###all-vs-all blastn

module load blast

makeblastdb -in cDNA_seqs.fa.masked -dbtype nucl

blastn -query cDNA_seqs.fa.masked -db cDNA_seqs.fa.masked \
            -max_target_seqs 10000 -outfmt 6 \
            -evalue 1e-3 -lcase_masking \
            -num_threads $SLURM_CPUS_PER_TASK \
            -word_size 11  >  blast_pairs.outfmt6

FusionFilter/util/blast_outfmt6_replace_trans_id_w_gene_symbol.pl cDNA_seqs.fa blast_pairs.outfmt6  | gzip > blast_pairs.gene_syms.outfmt6.gz

module load STAR/2.5.4a
module load blast/2.6.0+

module load samtools
./prep_genome_lib.pl --genome_fa ucsc.hg19.fasta --gtf ucsc.hg19_star.gtf --blast_pairs blast_pairs.gene_syms.outfmt6.gz --CPU $SLURM_CPUS_PER_TASK --pfam_db ./Pfam-A.hmm


In the last step, though I provided the "blast_pairs.gene_syms.outfmt6.gz " file, this step ran again and generated a similar file ref_annot.cdsplus.allvsall.outfmt6.genesym.gz,ref_annot.cdna.allvsall.outfmt6.toGenes.sorted.gz. Also, from the help of "prep_genome_lib.pl " I don't see a parameter to add the cDNA file to the command. Please let me know if I am missing anything.

Thanks.





On Tuesday, April 17, 2018 at 10:04:36 AM UTC-4, Brian Haas wrote:
:-)

On Tue, Apr 17, 2018 at 10:04 AM, Vineela Gangalapudi <vineelaga...@gmail.com> wrote:
Yup, doing it right now

On Tue, Apr 17, 2018 at 10:02 AM, Brian Haas <bh...@broadinstitute.org> wrote:
OK, best of luck.

Try to upgrade as soon as you can, though.  

~b

On Tue, Apr 17, 2018 at 10:01 AM, Vineela Gangalapudi <vineelaga...@gmail.com> wrote:
This is Perfect. I found the directory, Thanks a lot for the prompt response. I will rebuild the index and run STAR-fusion.



On Tue, Apr 17, 2018 at 9:59 AM, Brian Haas <bh...@broadinstitute.org> wrote:
You might very well find a wiki directory in the corresponding FusionFilter/ directory, which should have matched documentation.

best,

~brian




--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

--
You received this message because you are subscribed to the Google Groups "STAR-Fusion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to star-fusion...@googlegroups.com.

To post to this group, send email to star-...@googlegroups.com.
Visit this group at https://groups.google.com/group/star-fusion.

Brian Haas

unread,
Apr 24, 2018, 9:34:45 AM4/24/18
to Vineela Gangalapudi, STAR-Fusion
Hi Vineela,

If you've upgraded to the newer STAR-Fusion, then you'd follow these instructions:


This is different from the earlier fusion filter documentation.   If you're using the older STAR-Fusion, you'd follow the older fusion filter docs.

Apologies, but the documentation for FusionFilter that got bundled in the later star-fusion releases is the old version of the documentation.  The wiki sub-modules didn't get updated. I've been fighting with the submodules for a while now and think I've got it figured out now.... next time will have the updated documentation going with the releases.

again, sorry for the trouble!

~brian



To unsubscribe from this group and stop receiving emails from it, send an email to star-fusion+unsubscribe@googlegroups.com.

To post to this group, send email to star-...@googlegroups.com.
Visit this group at https://groups.google.com/group/star-fusion.

For more options, visit https://groups.google.com/d/optout.

Vineela Gangalapudi

unread,
Apr 24, 2018, 9:38:53 AM4/24/18
to Brian Haas, STAR-Fusion
So, there is no need to repeat mask the genome at all ?. I can directly follow the steps in here https://github.com/FusionFilter/FusionFilter/wiki/Building-a-Custom-FusionFilter-Dataset and build the index. Is that correct ?

Brian Haas

unread,
Apr 24, 2018, 9:40:57 AM4/24/18
to Vineela Gangalapudi, STAR-Fusion
That's right.    The repeat-masking was used in the older versions, not the newer versions.   It's handled differently now.    There are fewer steps and more automation.

best,

~b

Vineela Gangalapudi

unread,
Apr 24, 2018, 9:52:41 AM4/24/18
to Brian Haas, STAR-Fusion
Perfect, I have all the files then. Thank you.

Vineela Gangalapudi

unread,
Apr 25, 2018, 12:25:32 PM4/25/18
to STAR-Fusion
I was able to run STAR-fusion completely with version 1.2, after upgrading to 1.3.1, I have the following issue

Can't call method "fetch" on an undefined value at STAR-Fusion-v1.3.1/util/STAR-Fusion.map_chimeric_reads_to_genes line 302, <$fh> line15

I saw a discussion on this and you suggested a patch and provided this link https://github.com/STAR-Fusion/STAR-Fusion/blob/patch-master/util/STAR-Fusion.map_chimeric_reads_to_genes  . The link is not working as of now.
has this issue been fixed in version 1.3.2 ?

Please help



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

Brian Haas

unread,
Apr 25, 2018, 12:27:36 PM4/25/18
to Vineela Gangalapudi, STAR-Fusion
Yes, should be fixed in 1.3.2

-Brian
(by iPhone)

Vineela Gangalapudi

unread,
Apr 25, 2018, 12:27:53 PM4/25/18
to STAR-Fusion
I found your response to a similar issue, Thanks I will download the dev version .

Vineela Gangalapudi

unread,
May 1, 2018, 2:52:41 PM5/1/18
to STAR-Fusion
Is it mandatory to include GTF file while building custom Fusionfilter dataset.?

Brian Haas

unread,
May 1, 2018, 2:57:28 PM5/1/18
to Vineela Gangalapudi, STAR-Fusion

Yes, it's critical for all of star-fusion.



To unsubscribe from this group and stop receiving emails from it, send an email to star-fusion+unsubscribe@googlegroups.com.

To post to this group, send email to star-...@googlegroups.com.
Visit this group at https://groups.google.com/group/star-fusion.

For more options, visit https://groups.google.com/d/optout.

Vineela Gangalapudi

unread,
May 1, 2018, 3:05:20 PM5/1/18
to STAR-Fusion
Okay, previously I used to run STAR without using GTF file in index generation step. Later I used the cordinate sorted bam from STAR with featureCounts tool to generate exon, gene and transcript level counts.

While running featurecounts, I provided annotations of my interest either UCSC or ENSEMBL.

But now that I am using GTF file in the prep_genome.lib.pl step, I am confining my bam output to either UCSC or ENSEMBL.  --- Any suggestions?

Brian Haas

unread,
May 1, 2018, 3:22:10 PM5/1/18
to Vineela Gangalapudi, STAR-Fusion
We provide the Gencode reference gtf file w/ our source and plug-n-play repos.

If you're going to make your own, Ensembl would be the closest since Gencode leverages it.



To unsubscribe from this group and stop receiving emails from it, send an email to star-fusion+unsubscribe@googlegroups.com.

To post to this group, send email to star-...@googlegroups.com.
Visit this group at https://groups.google.com/group/star-fusion.

For more options, visit https://groups.google.com/d/optout.

Vineela Gangalapudi

unread,
May 15, 2018, 12:18:23 PM5/15/18
to STAR-Fusion
Hello Brian,

I have a quick question. I was going through the STAR-fusion paper https://www.biorxiv.org/content/early/2017/03/24/120295 . May I know which version of Defuse,tophat-fusion and  Fusioncatcher were used for the comparison. I was using deFuse (v0.7) previously and I am planning on replacing it  with STAR-Fusion. Hence the question

Thanks

Vineela Gangalapudi

unread,
May 15, 2018, 12:37:32 PM5/15/18
to STAR-Fusion
I found the information in the supplementary tables. Thanks.

--
You received this message because you are subscribed to the Google Groups "STAR-Fusion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to star-fusion+unsubscribe@googlegroups.com.
To post to this group, send email to star-...@googlegroups.com.
Visit this group at https://groups.google.com/group/star-fusion.

Brian Haas

unread,
May 15, 2018, 12:46:17 PM5/15/18
to Vineela Gangalapudi, STAR-Fusion
terrific!  I was hoping I put it there.

best,

~b


For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages