Question of parameters in STAR-Fusion

Ziyuan Zhao

unread,

Jan 16, 2023, 1:13:54 PM1/16/23

to STAR-Fusion

Hi, I have 2 question of parameters:
1.How to make the process faster?

I want to get the fusion table, other process (such as reconstruction process) is unneeded.

the code:
docker run -v `pwd`:/data --rm trinityctat/starfusion STAR-Fusion --left_fq /data/HRR025534_f1.fq.gz --right_fq /data/HRR025534_r2.fq.gz --genome_lib_dir /data/GRCh37_gencode_v19_CTAT_lib_Mar012021.plug-n-play/ctat_genome_lib_build_dir -O /data/StarFusionOutnew1

Because after the referencing process (about 40min), it starts to optimize loglikelihood and take a lot time for only one sample, so is the code above be the simplest way to generate the fusion table? Or if I could add some parameters and ignore some of the unneccessary process to make it faster?

2.How to make the result more strict？
Now I could find about 50+ fusions for each sample, can I change some params to select fewer fusions with higher confidence? (and if the fewer fusions were needed, would it cost less time?)

Thank you for your kind reply!

Brian Haas

unread,

Jan 18, 2023, 9:55:01 AM1/18/23

to STAR-Fusion

Hi,

responses below

On Monday, January 16, 2023 at 1:13:54 PM UTC-5 129310...@gmail.com wrote:

Hi, I have 2 question of parameters:
1.How to make the process faster?

The current (v1.12.0) release uses STAR one-pass which should be faster than the earlier two-pass default setting. After the alignment step, the rest should be relatively fast. There is a --CPU parameter where you can give it more cores to use, and that'll speed up several steps too.

I want to get the fusion table, other process (such as reconstruction process) is unneeded.

the code:
docker run -v `pwd`:/data --rm trinityctat/starfusion STAR-Fusion --left_fq /data/HRR025534_f1.fq.gz --right_fq /data/HRR025534_r2.fq.gz --genome_lib_dir /data/GRCh37_gencode_v19_CTAT_lib_Mar012021.plug-n-play/ctat_genome_lib_build_dir -O /data/StarFusionOutnew1

Because after the referencing process (about 40min), it starts to optimize loglikelihood and take a lot time for only one sample, so is the code above be the simplest way to generate the fusion table? Or if I could add some parameters and ignore some of the unneccessary process to make it faster?

I'm surprised the EM process is taking so long. Usually it's pretty quick, but I suppose there could be some complex scenarios that make it run longer. You can disable that step with --skip_EM

2.How to make the result more strict？
Now I could find about 50+ fusions for each sample, can I change some params to select fewer fusions with higher confidence? (and if the fewer fusions were needed, would it cost less time?)

It's not going to be faster by reducing the fusions output. You can add additional post-processing to filter the results based on more restrictive criteria. By default, the criteria used is employing a minimum of 0.1 FFPM fusion expression level threshold, which is sufficient to exclude fusions with little evidence and those that tend to not be relevant to cancer. If you're only interested in fusions that are known to be cancer relevant, you could filter based on the fusion annotations to restrict to certain attributes like known cosmic fusions - but not all cancer relevant fusions are in cosmic.... Some combination of filters might do the trick.

best of luck

Ziyuan Zhao

unread,

Jan 23, 2023, 1:11:18 AM1/23/23

to STAR-Fusion

Ok，thank you very much for your advice!

Reply all

Reply to author

Forward