FusionInspector default setting

212 views
Skip to first unread message

MH

unread,
Nov 17, 2023, 9:25:43 AM11/17/23
to Trinity_CTAT_users
Dear Brian and team,

Thank you for developing and maintaining FusionInspector!

Would you be able to state the default settings for FusionInspector in the documentation/ in the command lines when --help is executed? I see there are quite a few optional arguments to be used, but it would be useful to first know their default values. 


Best regards,
Min

Brian Haas

unread,
Nov 17, 2023, 9:46:38 AM11/17/23
to MH, Trinity_CTAT_users
Hi Min,

Here's the main documentation of interest:
https://github.com/FusionInspector/FusionInspector/wiki

and the simplest usage could be:

FusionInspector --fusions fusions.listA.txt,fusions.listB.txt \
                --genome_lib /path/to/CTAT_genome_lib \
                --left_fq rnaseq_1.fq --right_fq rnaseq_2.fq \
                --out_dir my_FusionInspector_outdir \
                --out_prefix finspector \
                --vis


best,

B

--
You received this message because you are subscribed to the Google Groups "Trinity_CTAT_users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinity_ctat_us...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trinity_ctat_users/7461eaa8-b1bf-4668-9842-e66491cc9fcbn%40googlegroups.com.


--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

MH

unread,
Nov 17, 2023, 9:57:55 AM11/17/23
to Trinity_CTAT_users
Thank you for your reply! 

I did have a look at the documentation before but couldn't find that information. For these settings (among others), what are their default values? Thinking I might need to play with the values a little - as some of my fusion of interest are not showing up and I'm not sure if that's because the settings are too stringent?

--min_junction_reads MIN_JUNCTION_READS
                        minimum number of junction-spanning reads required
  --min_sum_frags MIN_SUM_FRAGS
                        minimum fusion support = ( # junction_reads + # spanning_frags )
  --min_novel_junction_support MIN_NOVEL_JUNCTION_SUPPORT
                        minimum number of junction reads required if breakpoint lacks involvement of only reference junctions
  --min_spanning_frags_only MIN_SPANNING_FRAGS_ONLY
                        minimum number of spanning frags if no junction reads are found
  --require_LDAS REQUIRE_LDAS
                        require long double anchor support for split reads when no spanning frags are found
  --max_promiscuity MAX_PROMISCUITY
                        maximum number of partners allowed for a given fusion
  --min_pct_dom_promiscuity MIN_PCT_DOM_PROMISCUITY
                        for promiscuous fusions, those with less than this support of the dominant scoring pair are filtered prior to applying the max_promiscuity filter.
  --min_per_id MIN_PER_ID
                        minimum percent identity for a fusion-supporting read alignment
  --max_mate_dist MAX_MATE_DIST
                        max distance between mates, also max intron length for STAR alignments

Brian Haas

unread,
Nov 17, 2023, 10:00:54 AM11/17/23
to MH, Trinity_CTAT_users
ah, yes.  Let me look into that.  I wish the argparse would just provide the defaults, but I'll add them into the description and share them shortly.

Brian Haas

unread,
Jan 2, 2024, 12:58:13 PM1/2/24
to Trinity_CTAT_users
Looks like I got distracted here....   I've now updated the usage info to include the default values in the descriptions, and this will go into the next release.  Here's the updated usage info:


optional arguments:

  --genome_lib_dir GENOME_LIB_DIR

                        genome lib directory - see http://FusionFilter.github.io for details.  Uses env var CTAT_GENOME_LIB as default

  --samples_file SAMPLES_FILE

                        samples file for smartSeq2 single cell rna-seq (format: sample(tab)/path/left.fq(tab)/path/right.fq

  -O STR_OUT_DIR, --output_dir STR_OUT_DIR

                        output directory

  --out_prefix OUT_PREFIX

                        output filename prefix (default: finspector)

  --min_junction_reads MIN_JUNCTION_READS

                        minimum number of junction-spanning reads required (default: 0)

  --min_sum_frags MIN_SUM_FRAGS

                        minimum fusion support = ( # junction_reads + # spanning_frags )  (default: 1)

  --min_novel_junction_support MIN_NOVEL_JUNCTION_SUPPORT

                        minimum number of junction reads required if breakpoint lacks involvement of only reference junctions (default: 3)

  --min_spanning_frags_only MIN_SPANNING_FRAGS_ONLY

                        minimum number of spanning frags if no junction reads are found (default: 5)

  --require_LDAS REQUIRE_LDAS

                        require long double anchor support for split reads when no spanning frags are found (default: 1)

  --max_promiscuity MAX_PROMISCUITY

                        maximum number of partners allowed for a given fusion (default: 10)

  --min_pct_dom_promiscuity MIN_PCT_DOM_PROMISCUITY

                        for promiscuous fusions, those with less than this support of the dominant scoring pair are filtered prior to applying the max_promiscuity filter. (default: 50)

  --min_per_id MIN_PER_ID

                        minimum percent identity for a fusion-supporting read alignment (defualt: 96)

  --max_mate_dist MAX_MATE_DIST

                        max distance between mates, also max intron length for STAR alignments (default: 100000)

  --only_fusion_reads   include only read alignments in output that support fusion

  --capture_genome_alignments

                        reports ref genome alignments too (for debugging only)

  --include_Trinity     include fusion-guided Trinity assembly

  --vis                 generate bam, bed, etc., and generate igv-reports html visualization

  --write_intermediate_results

                        generate bam, bed, etc., for intermediate aligner outputs

  --cleanup             cleanup the fusion inspector workspace, remove intermediate output files

  --CPU CPU             number of threads for multithreaded processes (default: 4)

  --annotate            annotate fusions based on known cancer fusions and those found in normal tissues

  --examine_coding_effect

                        explore impact of fusions on coding sequences

  --aligner_path ALIGNER_PATH

                        path to the aligner tool (default: uses PATH setting)

  --fusion_contigs_only

                        align reads only to the fusion contigs (note, FFPM calcs disabled in this mode)

  --extract_fusion_reads_file EXTRACT_FUSION_READS_FILE

                        file prefix to write fusion evidence reads in fastq format

  --no_remove_dups      do not exclude duplicate reads

  --version             provide version info: 2.9.0

  --no_FFPM             do not compute FFPM value - ie. using inspect instead of validate mode, in which case FFPM would not be meaningful given the full sample of reads is not evaluated

  --no_splice_score_boost

                        do not augment alignment score for spliced alignments

  --no_shrink_introns   do not shrink introns

  --shrink_intron_max_length SHRINK_INTRON_MAX_LENGTH

                        maximum length of introns when shrunk (default: 1000)

  --skip_EM             skip expectation maximization step that fractionally assigns spanning frags across multiple breakpoints

  --incl_microH_expr_brkpt_plots

                         include microhomology expression breakpoint plots

  --predict_cosmic_like

                        predict if fusion looks COSMIC-like wrt expression and microhomology charachteristics. Automatically disabled if --no_FFPM is set.

  --STAR_xtra_params STAR_XTRA_PARAMS

                        extra parameters to pass on to the STAR aligner

  --no_homology_filter  no gene symbol-based blast pair homology filter or promiscuity checks to remove potential false positives

  --no_annot_filter     no annotation-based filters applied (ie. removing GTEx normal fusions)

  --max_sensitivity      max sensitivity settings (specificity unchecked) equivalent to --min_sum_frags 1 --min_spanning_frags_only 1 --min_novel_junction_support 1 --require_LDAS 0 --no_homology_filter --no_annot_filter --min_per_id 1 --no_remove_dups --skip_EM

  --extreme_sensitivity

                        extreme sensitivity. If there are evidence reads, this should ideally find them - however, false positive rate is expected to be maximally high too!. Equivalent to settings:  --max_sensitivity --fusion_contigs_only  --max_mate_dist 10000000

  --FI_contigs_gtf FI_CONTIGS_GTF

                        provide the fusion inspector contig targets directly instead of making it at runtime.

  --FI_contigs_fa FI_CONTIGS_FA

                        provide the fusion inspector contigs fasta directly instead of making it at runtime

MH

unread,
Jan 10, 2024, 12:33:19 AM1/10/24
to Trinity_CTAT_users

Happy new year & thank you for your reply! :)


Could you also explain these columns in the fusioninspector output file for me: 

CDS_LEFT_RANGE  (eg. 1-1159)
CDS_RIGHT_RANGE (eg. 514-3471)
In these two columns, the value is a number range that seems to indicate where the 5' and 3' transcript start and stop, in the predicted fusion CDS. If that's the case, why is there overlap between these two range?

FUSION_CDS 
This column outputs the sequence of predicted CDS of the fusion transcript. This is pretty clear, however:
(1) where is the location of predicted breakpoint? or is that not indicated
(2) what does lower and upper cases mean? how about asterisk? 
(3) is this just a combination of sequences from: start of the first exon of the 5' transcript till the breakpoint +  breakpoint till the end of the last exon on the 3' transcript?

FUSION_TRANSL

Looks like this is the predicted sequence of fusion protein product. How is this determined? Is it the translation of sequence between the first START and STOP codon in CDS? 

Brian Haas

unread,
Jan 10, 2024, 8:41:56 AM1/10/24
to MH, Trinity_CTAT_users
Hi,

Responses below

On Wed, Jan 10, 2024 at 12:33 AM MH <minhu...@gmail.com> wrote:

Happy new year & thank you for your reply! :)


Could you also explain these columns in the fusioninspector output file for me: 

CDS_LEFT_RANGE  (eg. 1-1159)
CDS_RIGHT_RANGE (eg. 514-3471)
In these two columns, the value is a number range that seems to indicate where the 5' and 3' transcript start and stop, in the predicted fusion CDS. If that's the case, why is there overlap between these two range?

From the fusion transcript breakpoint on the genome, it attempts to reconstruct fusion transcripts based on the reference genome annotations that have splicing at the breakpoints.  The cds left corresponds to the left-gene, and cds-right to the right-gene, and the coordinates reflect the corresponding cDNA sequences of the corresponding reference transcript structures/sequences that are fused together.


FUSION_CDS 
This column outputs the sequence of predicted CDS of the fusion transcript. This is pretty clear, however:
(1) where is the location of predicted breakpoint? or is that not indicated

The breakpoint is that reported for the fusion in the context of the genome. If you want the transcript-relative coordinates, see the above cds-left and cds-right range info.
 
(2) what does lower and upper cases mean? how about asterisk? 

upper/lower case should discriminate between the two fusion partner sequences as they're derived.  If you see an asterisk in a translated sequence, that's a stop codon representation.
 
(3) is this just a combination of sequences from: start of the first exon of the 5' transcript till the breakpoint +  breakpoint till the end of the last exon on the 3' transcript?

Yes - it tries all combinations of reference gene structure isoforms, and if there's one that's in-frame, it'll prefer that over those that involve frame-shifting. 

FUSION_TRANSL

Looks like this is the predicted sequence of fusion protein product. How is this determined? Is it the translation of sequence between the first START and STOP codon in CDS? 


It'll translate the entire fusion CDS sequence.  If there's a frameshift, you'll tend to see stops (asterisks) in the latter part.


I hope this helps clarify things.

best,

Brian
 
Reply all
Reply to author
Forward
0 new messages