detect gene fusion between virus and human

69 views
Skip to first unread message

evan

unread,
Jun 16, 2025, 4:25:29 AM6/16/25
to Trinity_CTAT_users
Hi,

Hi, I am using CTAT-LR-fusion to detect gene fusions between human and Influenza A virus (IAV) . 
I use ctat-genome-lib-builder to build a new ctat genome lib including human and IAV, and then use CTAT-LR-fusion to detect gene fusion between virus and human. I wanted to know why the final output result was only human-human fusion but not human-virus ones? The genome and annotations of IAV were made by myself.
Here are the codes I used:
prep_genome_lib.pl \
  --genome_fa $genome \
  --gtf $gtf \
  --annot_filter_rule $pm \
  --pfam_db current \
  --dfam_db human \
  --human_gencode_filter \
  --CPU 64 \
  --output_dir $out


ctat-LR-fusion --LR_bam $bam \
               --genome_lib_dir $genome_dir \
               --CPU 64 \
               --vis \
               --output $outdir \
               --examine_coding_effect \
               --extract_fusion_LR_fasta $fusion_fa \
               --no_abundance_filter \
               --no_annot_filter \
               --min_trans_overlap_length 30

  Thanks for your help!  

Brian Haas

unread,
Jun 16, 2025, 7:33:43 AM6/16/25
to evan, Trinity_CTAT_users
Hi,


Will that work for you?



--
You received this message because you are subscribed to the Google Groups "Trinity_CTAT_users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinity_ctat_us...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/trinity_ctat_users/a7abd18d-6333-49c8-aef0-cd93d105146an%40googlegroups.com.


--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

evan

unread,
Jun 16, 2025, 7:54:17 AM6/16/25
to Trinity_CTAT_users

Oh, thank you. But my data is ONT, so it looks like CTAT-VIF may not be a good fit.

Brian Haas

unread,
Jun 16, 2025, 7:59:55 AM6/16/25
to evan, Trinity_CTAT_users
I see.   I haven't tried to use ctat-LR-fusion in the virus/human context.  If you incorporate the viral genome into the ctat genome lib along with the virus reference annotations, I suppose it could work. You might dig through some of the intermediate outputs generated to see if the virus shows up in any of them.  If it does, then it could be a downstream filtering issue that could be explored.

evan

unread,
Jun 16, 2025, 8:37:56 AM6/16/25
to Trinity_CTAT_users
Thank you, I will do that. In addition, since building the ctat genome lib takes quite a long time, I’d also like to ask if there might be any issues with the command I used for building the ctat genome lib? Is there a way to check from the generated ctat_genome_lib_build_dir whether the viral genome was properly merged into the human genome? IAV doesn’t have Ensembl annotation, so I converted the NCBI annotation into an Ensembl-style annotation file— could you help me check if the annotation is appropriate?  
H9N2_ensembl_like.gtf

Brian Haas

unread,
Jun 16, 2025, 9:20:13 AM6/16/25
to evan, Trinity_CTAT_users
The annotation file looks fine afaict.

For testing, maybe try creating a chimeric virus/human transcript fusion, simulate long reads, and see if ctat-lr-fusion can find it.


evan

unread,
Jun 19, 2025, 6:56:42 AM6/19/25
to Trinity_CTAT_users

Thank you. After these days of testing, I found that the exon entries in the initial annotation file are very important. Missing exons or incorrect exon formats are one of the reasons why I didn’t get output. In addition, for the script FusionInspector/util/fusion_pair_to_mini_genome_join.pl, the --genome_flank <int> parameter has a default value of 1000, which seems too large for virus and causes errors like:

[1/1 = 100.0 % done]    [E::hts_parse_region] Coordinates must be > 0
[W::fai_get_val] Reference PB1:-975-3295 not found in FASTA file, returning empty sequence
[E::hts_parse_region] Coordinates must be > 0
[W::fai_get_val] Reference PB1:-975-3295 not found in FASTA file, returning empty sequence
[faidx] Failed to fetch sequence in PB1:-975-3295

Can this parameter be adjusted, for example, to 20? Do you have any good suggestions on this?

Brian Haas

unread,
Jun 19, 2025, 8:12:52 AM6/19/25
to evan, Trinity_CTAT_users
Nice detective work there!

One thought would be to pad your viral genome by a few kb of N characters on each side, and adjust your GTF feature coordinates accordingly.  This should allow FusionInspector to build the fusion contigs.

Reply all
Reply to author
Forward
0 new messages