using FusionInspector with long-read data?

29 views
Skip to first unread message

Asher Preska Steinberg

unread,
Dec 18, 2024, 10:08:44 AM12/18/24
to Trinity_CTAT_users
Hi all,

Thank you for building these awesome tools for fusion detection. I have two questions:

1. I see that ctat-LR-fusion has adopted parts of FusionInspector, but I am wondering if FusionInspector alone works on long-read data? The reason I'm asking is because I have a set of fusion calls from JAFFAL that I am trying to verify, and I wanted to feed a list of the fusions I was interested in to FusionInspector. 

2. I am wondering what resources you would recommend using for ctat-LR-fusion? I have ONT cDNA data, and the fastq is ~88 GB in size. As an initial pass, I tried running ctat-LR-fusion with 8 cpus, 150 GB, and 12 hours on our HPC, but the job timed out. These were the flags I had:

singularity exec -e -B /data1/shahs3:/data1/shahs3 \
${sif_path} ctat-LR-fusion \
-T ${fastq} \
--genome_lib_dir ${genome_lib_path} \
--CPU 8 \
--output ${outdir} \
--vis \
--examine_coding_effect \
--extract_fusion_LR_fasta ${outdir}/fusion_reads

Thanks for your time and help!

Best,
Asher Preska Steinberg

Brian Haas

unread,
Dec 18, 2024, 10:28:34 AM12/18/24
to Asher Preska Steinberg, Trinity_CTAT_users
Hi Asher,

Responses below:

1. I see that ctat-LR-fusion has adopted parts of FusionInspector, but I am wondering if FusionInspector alone works on long-read data? The reason I'm asking is because I have a set of fusion calls from JAFFAL that I am trying to verify, and I wanted to feed a list of the fusions I was interested in to FusionInspector. 


That's right - FusionInspector only works for short reads at the moment. We could aim to make a long read version at some point, but it doesn't help right now. Hopefully CTAT-LR-Fusion finds the fusions of interest, though, in the meantime.
 
2. I am wondering what resources you would recommend using for ctat-LR-fusion? I have ONT cDNA data, and the fastq is ~88 GB in size. As an initial pass, I tried running ctat-LR-fusion with 8 cpus, 150 GB, and 12 hours on our HPC, but the job timed out. These were the flags I had:

singularity exec -e -B /data1/shahs3:/data1/shahs3 \
${sif_path} ctat-LR-fusion \
-T ${fastq} \
--genome_lib_dir ${genome_lib_path} \
--CPU 8 \
--output ${outdir} \
--vis \
--examine_coding_effect \
--extract_fusion_LR_fasta ${outdir}/fusion_reads



The initial ctat-minimap2 step is likely the bottleneck here and it can use a lot of RAM per million long reads - something we also need to explore more to better understand.  The easiest thing to do is to partition the input data into several smaller files that have ~ 15 million reads each and run them through separately, in the case that RAM usage is too high.

Another thing to check - be sure you're using the most recent version of CTAT-LR-Fusion, as it works drastically better for ONT data than the earlier versions. (maybe this solves the issue? fingers crossed...)

Hope this helps,

Brian


 

Asher Preska Steinberg

unread,
Dec 18, 2024, 10:49:19 AM12/18/24
to Trinity_CTAT_users
Hi Brian,

Thanks for the quick reply! Both of these responses are super helpful. 

Re: resources. Sounds good. It does appear I am using the latest version (assuming v1.0.0 is the latest based on your github releases). The RAM usage is okay, I was mainly wondering if I could speed it up if I provide more cpus? Also, the partitioning route sounds appealing to me. If I do this, do I need to eventually merge everything back together at some step?

Thanks again for your time and help.

Best,
Asher

Brian Haas

unread,
Dec 18, 2024, 10:55:18 AM12/18/24
to Asher Preska Steinberg, Trinity_CTAT_users
Using more CPUs should definitely speed up certain parts, particularly those involving minimap2 (two stages there).

Yes, combining the results afterwards would be needed - you could just treat the partitions like technical replicates.  I would only do the splitting if you have more than ~20M reads with at least 10M per partition - just to make sure that relevant fusions, if found, will have sufficient read support (default min FFPM threshold is at 0.1, which requires at least 1 fusion read per 10M total reads - but you can reduce this if needed too for higher sensitivity albeit higher FP rate).

--
You received this message because you are subscribed to the Google Groups "Trinity_CTAT_users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinity_ctat_us...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/trinity_ctat_users/bc6cc6eb-5f87-40e3-8891-9ff41794247an%40googlegroups.com.


--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 
Reply all
Reply to author
Forward
0 new messages