Hi Brian,
I'm trying to detect hepatitis B viral integration into the human genome. I was able to identify chimeric human-viral sequences using STAR but was having difficulty extracting the chimeric sequences themselves (something I'd like to do so we can make primers and hopefully amplify some of them). I was hoping STAR-Fusion might help me with this but have been having trouble adapting it to this question (rather than looking for fusion proteins in malignancy, which I think is the main focus of STAR-Fusion). I'm trying to build the CTAT genome library using concatenated (GRCh38 and hepatitis B virus) fasta and gtf files.
This reaches 96.62% done before failing. I get the following errors:
Error, no isoform struct for [unassigned_transcript_8] at /home/gdskinnerlab/nes002/software/STAR-Fusion-v1.12.0/ctat-genome-lib-builder/util/
isoform_blast_gene_chr_conversion.pl line 92, <$fh> line 240742150.
Error, cmd: /home/gdskinnerlab/nes002/software/STAR-Fusion-v1.12.0/ctat-genome-lib-builder/util/
isoform_blast_gene_chr_conversion.pl --blast_outfmt6 ref_annot.cdna.fa.allvsall.outfmt6 --gtf /home/gdskinnerlab/nes002/HBV_visium/GRCh38_HBV_concat_ed.gtf > ref_annot.cdna.fa.allvsall.outfmt6.toGenes died with ret 65280 No such file or directory at /home/gdskinnerlab/nes002/software/STAR-Fusion-v1.12.0/ctat-genome-lib-builder/lib/Pipeliner.pm line 186.
Unassigned_transcript_8 is the in the HBV genome. The annotation looks like this:
NC_003977.2 RefSeq stop_codon 835 837 . + 0 gene_id "HBVgp2"; transcript_id "unassigned_transcript_8"; gbkey "CDS"; gene "S"; locus_tag "HBVgp2"; product "small envelope protein"; protein_id "YP_009173871.1"; exon_number "2";
I'm not sure what the isoform structure is or how to correct/bypass this error. Or perhaps there isn't a good way to use STAR-Fusion for this application?
I'd appreciate any thoughts you have. I'm putting the run script at the bottom of this message but let me know if there's any other information that would be helpful.
Thanks,
Nicole
---------------------------------
#!/bin/bash
#SBATCH -p himem
#SBATCH -t 3-00:00:00
#SBATCH --cpus-per-task=5
#SBATCH --mem-per-cpu=100G
set -e
export PATH=$PATH:~/software/STAR-2.7.10b/source/
export PATH=$PATH:~/software/STAR-Fusion-v1.12.0
export PATH=$PATH:~/software/singularity-ce-3.11.4
singularity exec -e \
/home/gdskinnerlab/nes002/software/star-fusion.v1.12.0.simg \
/home/gdskinnerlab/nes002/software/STAR-Fusion-v1.12.0/ctat-genome-lib-builder/
prep_genome_lib.pl \
--genome_fa /home/gdskinnerlab/nes002/HBV_visium/GRCh38_HBVgenD_concat.fasta \
--gtf /home/gdskinnerlab/nes002/HBV_visium/GRCh38_HBV_concat_ed.gtf \
--fusion_annot_lib /home/gdskinnerlab/nes002/HBV_visium/fusion_lib.Mar2021.dat.gz \
--dfam_db human \
--pfam_db current