STAR-Fusion to detect viral genomic insertions

56 views
Skip to first unread message

Nicole Skinner

unread,
Jul 23, 2023, 12:54:43 PM7/23/23
to STAR-Fusion
Hi Brian,

I'm trying to detect hepatitis B viral integration into the human genome. I was able to identify chimeric human-viral sequences using STAR but was having difficulty extracting the chimeric sequences themselves (something I'd like to do so we can make primers and hopefully amplify some of them). I was hoping STAR-Fusion might help me with this but have been having trouble adapting it to this question (rather than looking for fusion proteins in malignancy, which I think is the main focus of STAR-Fusion). I'm trying to build the CTAT genome library using concatenated (GRCh38 and hepatitis B virus) fasta and gtf files.

This reaches 96.62% done before failing. I get the following errors:
Error, no isoform struct for [unassigned_transcript_8] at /home/gdskinnerlab/nes002/software/STAR-Fusion-v1.12.0/ctat-genome-lib-builder/util/isoform_blast_gene_chr_conversion.pl line 92, <$fh> line 240742150.
Error, cmd: /home/gdskinnerlab/nes002/software/STAR-Fusion-v1.12.0/ctat-genome-lib-builder/util/isoform_blast_gene_chr_conversion.pl --blast_outfmt6 ref_annot.cdna.fa.allvsall.outfmt6 --gtf /home/gdskinnerlab/nes002/HBV_visium/GRCh38_HBV_concat_ed.gtf > ref_annot.cdna.fa.allvsall.outfmt6.toGenes died with ret 65280 No such file or directory at /home/gdskinnerlab/nes002/software/STAR-Fusion-v1.12.0/ctat-genome-lib-builder/lib/Pipeliner.pm line 186.

Unassigned_transcript_8 is the in the HBV genome. The annotation looks like this:
NC_003977.2 RefSeq stop_codon 835 837 . + 0 gene_id "HBVgp2"; transcript_id "unassigned_transcript_8"; gbkey "CDS"; gene "S"; locus_tag "HBVgp2"; product "small envelope protein"; protein_id "YP_009173871.1"; exon_number "2";

I'm not sure what the isoform structure is or how to correct/bypass this error. Or perhaps there isn't a good way to use STAR-Fusion for this application?

I'd appreciate any thoughts you have. I'm putting the run script at the bottom of this message but let me know if there's any other information that would be helpful.

Thanks,
Nicole
---------------------------------

#!/bin/bash
#SBATCH -p himem
#SBATCH -t 3-00:00:00
#SBATCH --cpus-per-task=5
#SBATCH --mem-per-cpu=100G

set -e

export PATH=$PATH:~/software/STAR-2.7.10b/source/
export PATH=$PATH:~/software/STAR-Fusion-v1.12.0
export PATH=$PATH:~/software/singularity-ce-3.11.4

singularity exec -e \
/home/gdskinnerlab/nes002/software/star-fusion.v1.12.0.simg \
/home/gdskinnerlab/nes002/software/STAR-Fusion-v1.12.0/ctat-genome-lib-builder/prep_genome_lib.pl \
           --genome_fa /home/gdskinnerlab/nes002/HBV_visium/GRCh38_HBVgenD_concat.fasta  \
           --gtf /home/gdskinnerlab/nes002/HBV_visium/GRCh38_HBV_concat_ed.gtf \
           --fusion_annot_lib /home/gdskinnerlab/nes002/HBV_visium/fusion_lib.Mar2021.dat.gz \
           --dfam_db human \
           --pfam_db current

Brian Haas

unread,
Jul 23, 2023, 1:12:35 PM7/23/23
to Nicole Skinner, STAR-Fusion
Hi Nicole,

We have a tool called ctat-VIF that should do this for you, instead of
using star-fusion:

https://github.com/broadinstitute/CTAT-VirusIntegrationFinder/wiki

It works for HBV and a bunch of others, with hopefully minimal setup
into a 'regular' CTAT genome lib.

Please give this a try and let me know if there's more we can do here
to help with experimental follow-up, as I can envision having some
additional scripts to extract the sequences, etc.

best,

~b
> --
> You received this message because you are subscribed to the Google Groups "STAR-Fusion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to star-fusion...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/star-fusion/665c8281-8f4a-4ee4-80a3-da68b6fe281an%40googlegroups.com.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

Nicole Skinner

unread,
Jul 23, 2023, 3:06:28 PM7/23/23
to STAR-Fusion
Thanks so much -- I didn't know about this tool. I'll reach back out if I have trouble extracting the sequence data.

Nicole
Reply all
Reply to author
Forward
0 new messages