FusionInspector default setting

MH

unread,

Nov 17, 2023, 9:25:43 AM11/17/23

to Trinity_CTAT_users

Dear Brian and team,

Thank you for developing and maintaining FusionInspector!

Would you be able to state the default settings for FusionInspector in the documentation/ in the command lines when --help is executed? I see there are quite a few optional arguments to be used, but it would be useful to first know their default values.

Best regards,

Min

Brian Haas

unread,

Nov 17, 2023, 9:46:38 AM11/17/23

to MH, Trinity_CTAT_users

Hi Min,

Here's the main documentation of interest:
https://github.com/FusionInspector/FusionInspector/wiki

and the simplest usage could be:

FusionInspector --fusions fusions.listA.txt,fusions.listB.txt \
--genome_lib /path/to/CTAT_genome_lib \
--left_fq rnaseq_1.fq --right_fq rnaseq_2.fq \
--out_dir my_FusionInspector_outdir \
--out_prefix finspector \
--vis

best,

B

--
You received this message because you are subscribed to the Google Groups "Trinity_CTAT_users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinity_ctat_us...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trinity_ctat_users/7461eaa8-b1bf-4668-9842-e66491cc9fcbn%40googlegroups.com.

--

--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

MH

unread,

Nov 17, 2023, 9:57:55 AM11/17/23

to Trinity_CTAT_users

Thank you for your reply!

I did have a look at the documentation before but couldn't find that information. For these settings (among others), what are their default values? Thinking I might need to play with the values a little - as some of my fusion of interest are not showing up and I'm not sure if that's because the settings are too stringent?

--min_junction_reads MIN_JUNCTION_READS
minimum number of junction-spanning reads required
--min_sum_frags MIN_SUM_FRAGS
minimum fusion support = ( # junction_reads + # spanning_frags )
--min_novel_junction_support MIN_NOVEL_JUNCTION_SUPPORT
minimum number of junction reads required if breakpoint lacks involvement of only reference junctions
--min_spanning_frags_only MIN_SPANNING_FRAGS_ONLY
minimum number of spanning frags if no junction reads are found
--require_LDAS REQUIRE_LDAS
require long double anchor support for split reads when no spanning frags are found
--max_promiscuity MAX_PROMISCUITY
maximum number of partners allowed for a given fusion
--min_pct_dom_promiscuity MIN_PCT_DOM_PROMISCUITY
for promiscuous fusions, those with less than this support of the dominant scoring pair are filtered prior to applying the max_promiscuity filter.
--min_per_id MIN_PER_ID
minimum percent identity for a fusion-supporting read alignment
--max_mate_dist MAX_MATE_DIST
max distance between mates, also max intron length for STAR alignments

Brian Haas

unread,

Nov 17, 2023, 10:00:54 AM11/17/23

to MH, Trinity_CTAT_users

ah, yes. Let me look into that. I wish the argparse would just provide the defaults, but I'll add them into the description and share them shortly.

To view this discussion on the web visit https://groups.google.com/d/msgid/trinity_ctat_users/71db6fc9-5cd6-4d03-9214-66070ac92f31n%40googlegroups.com.

Brian Haas

unread,

Jan 2, 2024, 12:58:13 PM1/2/24

to Trinity_CTAT_users

Looks like I got distracted here.... I've now updated the usage info to include the default values in the descriptions, and this will go into the next release. Here's the updated usage info:

optional arguments:

--genome_lib_dir GENOME_LIB_DIR

genome lib directory - see http://FusionFilter.github.io for details. Uses env var CTAT_GENOME_LIB as default

--samples_file SAMPLES_FILE

samples file for smartSeq2 single cell rna-seq (format: sample(tab)/path/left.fq(tab)/path/right.fq

-O STR_OUT_DIR, --output_dir STR_OUT_DIR

output directory

--out_prefix OUT_PREFIX

output filename prefix (default: finspector)

--min_junction_reads MIN_JUNCTION_READS

minimum number of junction-spanning reads required (default: 0)

--min_sum_frags MIN_SUM_FRAGS

minimum fusion support = ( # junction_reads + # spanning_frags ) (default: 1)

--min_novel_junction_support MIN_NOVEL_JUNCTION_SUPPORT

minimum number of junction reads required if breakpoint lacks involvement of only reference junctions (default: 3)

--min_spanning_frags_only MIN_SPANNING_FRAGS_ONLY

minimum number of spanning frags if no junction reads are found (default: 5)

--require_LDAS REQUIRE_LDAS

require long double anchor support for split reads when no spanning frags are found (default: 1)

--max_promiscuity MAX_PROMISCUITY

maximum number of partners allowed for a given fusion (default: 10)

--min_pct_dom_promiscuity MIN_PCT_DOM_PROMISCUITY

for promiscuous fusions, those with less than this support of the dominant scoring pair are filtered prior to applying the max_promiscuity filter. (default: 50)

--min_per_id MIN_PER_ID

minimum percent identity for a fusion-supporting read alignment (defualt: 96)

--max_mate_dist MAX_MATE_DIST

max distance between mates, also max intron length for STAR alignments (default: 100000)

--only_fusion_reads include only read alignments in output that support fusion

--capture_genome_alignments

reports ref genome alignments too (for debugging only)

--include_Trinity include fusion-guided Trinity assembly

--vis generate bam, bed, etc., and generate igv-reports html visualization

--write_intermediate_results

generate bam, bed, etc., for intermediate aligner outputs

--cleanup cleanup the fusion inspector workspace, remove intermediate output files

--CPU CPU number of threads for multithreaded processes (default: 4)

--annotate annotate fusions based on known cancer fusions and those found in normal tissues

--examine_coding_effect

explore impact of fusions on coding sequences

--aligner_path ALIGNER_PATH

path to the aligner tool (default: uses PATH setting)

--fusion_contigs_only

align reads only to the fusion contigs (note, FFPM calcs disabled in this mode)

--extract_fusion_reads_file EXTRACT_FUSION_READS_FILE

file prefix to write fusion evidence reads in fastq format

--no_remove_dups do not exclude duplicate reads

--version provide version info: 2.9.0

--no_FFPM do not compute FFPM value - ie. using inspect instead of validate mode, in which case FFPM would not be meaningful given the full sample of reads is not evaluated

--no_splice_score_boost

do not augment alignment score for spliced alignments

--no_shrink_introns do not shrink introns

--shrink_intron_max_length SHRINK_INTRON_MAX_LENGTH

maximum length of introns when shrunk (default: 1000)

--skip_EM skip expectation maximization step that fractionally assigns spanning frags across multiple breakpoints

--incl_microH_expr_brkpt_plots

include microhomology expression breakpoint plots

--predict_cosmic_like

predict if fusion looks COSMIC-like wrt expression and microhomology charachteristics. Automatically disabled if --no_FFPM is set.

--STAR_xtra_params STAR_XTRA_PARAMS

extra parameters to pass on to the STAR aligner

--no_homology_filter no gene symbol-based blast pair homology filter or promiscuity checks to remove potential false positives

--no_annot_filter no annotation-based filters applied (ie. removing GTEx normal fusions)

--max_sensitivity max sensitivity settings (specificity unchecked) equivalent to --min_sum_frags 1 --min_spanning_frags_only 1 --min_novel_junction_support 1 --require_LDAS 0 --no_homology_filter --no_annot_filter --min_per_id 1 --no_remove_dups --skip_EM

--extreme_sensitivity

extreme sensitivity. If there are evidence reads, this should ideally find them - however, false positive rate is expected to be maximally high too!. Equivalent to settings: --max_sensitivity --fusion_contigs_only --max_mate_dist 10000000

--FI_contigs_gtf FI_CONTIGS_GTF

provide the fusion inspector contig targets directly instead of making it at runtime.

--FI_contigs_fa FI_CONTIGS_FA

provide the fusion inspector contigs fasta directly instead of making it at runtime

MH

unread,

Jan 10, 2024, 12:33:19 AM1/10/24

to Trinity_CTAT_users

Happy new year & thank you for your reply! :)

Could you also explain these columns in the fusioninspector output file for me:

CDS_LEFT_RANGE (eg. 1-1159)
CDS_RIGHT_RANGE (eg. 514-3471)

In these two columns, the value is a number range that seems to indicate where the 5' and 3' transcript start and stop, in the predicted fusion CDS. If that's the case, why is there overlap between these two range?

FUSION_CDS

This column outputs the sequence of predicted CDS of the fusion transcript. This is pretty clear, however:

(1) where is the location of predicted breakpoint? or is that not indicated

(2) what does lower and upper cases mean? how about asterisk?

(3) is this just a combination of sequences from: start of the first exon of the 5' transcript till the breakpoint + breakpoint till the end of the last exon on the 3' transcript?

FUSION_TRANSL

Looks like this is the predicted sequence of fusion protein product. How is this determined? Is it the translation of sequence between the first START and STOP codon in CDS?

Brian Haas

unread,

Jan 10, 2024, 8:41:56 AM1/10/24

to MH, Trinity_CTAT_users

Hi,

Responses below

On Wed, Jan 10, 2024 at 12:33 AM MH <minhu...@gmail.com> wrote:

Happy new year & thank you for your reply! :)

Could you also explain these columns in the fusioninspector output file for me:

CDS_LEFT_RANGE (eg. 1-1159)
CDS_RIGHT_RANGE (eg. 514-3471)
In these two columns, the value is a number range that seems to indicate where the 5' and 3' transcript start and stop, in the predicted fusion CDS. If that's the case, why is there overlap between these two range?

From the fusion transcript breakpoint on the genome, it attempts to reconstruct fusion transcripts based on the reference genome annotations that have splicing at the breakpoints. The cds left corresponds to the left-gene, and cds-right to the right-gene, and the coordinates reflect the corresponding cDNA sequences of the corresponding reference transcript structures/sequences that are fused together.

FUSION_CDS
This column outputs the sequence of predicted CDS of the fusion transcript. This is pretty clear, however:
(1) where is the location of predicted breakpoint? or is that not indicated

The breakpoint is that reported for the fusion in the context of the genome. If you want the transcript-relative coordinates, see the above cds-left and cds-right range info.

(2) what does lower and upper cases mean? how about asterisk?

upper/lower case should discriminate between the two fusion partner sequences as they're derived. If you see an asterisk in a translated sequence, that's a stop codon representation.

(3) is this just a combination of sequences from: start of the first exon of the 5' transcript till the breakpoint + breakpoint till the end of the last exon on the 3' transcript?

Yes - it tries all combinations of reference gene structure isoforms, and if there's one that's in-frame, it'll prefer that over those that involve frame-shifting.

FUSION_TRANSL
Looks like this is the predicted sequence of fusion protein product. How is this determined? Is it the translation of sequence between the first START and STOP codon in CDS?

It'll translate the entire fusion CDS sequence. If there's a frameshift, you'll tend to see stops (asterisks) in the latter part.

I hope this helps clarify things.

best,

Brian

To view this discussion on the web visit https://groups.google.com/d/msgid/trinity_ctat_users/6db39b1c-8b3d-46e0-abe3-4766605bc440n%40googlegroups.com.

Reply all

Reply to author

Forward