How to build Genome Resource Library for pre-mature RNA

33 views
Skip to first unread message

Kosuke

unread,
Jan 20, 2023, 7:14:15 PM1/20/23
to STAR-Fusion

Hi all,

 

Now I would like to perform STAR-Fusion to detect gene fusion using reference of mature RNA as well as that of “pre-mature” RNA.

So, I tried to build reference for “pre-mature” RNA.

 

First, creating a GTF file where all transcript annotations were replaced with exons and everything else was removed.

 

awk 'BEGIN{FS="\t"; OFS="\t"} $3 == "transcript"{ $3="exon"; print}' \

$HOME/reference_genome/Star_fusion_lib/GRCh37_gencode_v19_CTAT_lib_Mar012021.source/gencode.v19.annotation.gtf > $HOME/reference_genome/Star_fusion_lib/GRCh37_gencode_prematureRNA_6/gencode.v19.annotation_for_prematureRNA.gtf

 

Second, we ran this command:

 

singularity exec -e ${command}/starfusion-latest.simg \

/usr/local/src/STAR-Fusion/ctat-genome-lib-builder/prep_genome_lib.pl \

--genome_fa GRCh37.p13.genome.primary.fa \

--gtf gencode.v19.annotation_for_prematureRNA.gtf \

--fusion_annot_lib fusion_lib.Mar2021.dat.gz \

--pfam_db current \

--dfam_db human \

--output_dir ${output}

 

 

* Running CMD: STAR --runThreadN 4 --runMode genomeGenerate --genomeDir /home/reference_genome/Star_fusion_lib/GRCh37_gencode_custum_for_prematureRNA_5_result/ref_genome.fa.star.idx   --genomeFastaFiles /home/reference_genome/Star_fusion_lib/GRCh37_gencode_custum_for_prematureRNA_5/GRCh37.p13.genome.primary.fa  --limitGenomeGenerateRAM 40419136213  --genomeChrBinNbits 16  --sjdbGTFfile /home/reference_genome/Star_fusion_lib/GRCh37_gencode_custum_for_prematureRNA_5/gencode.v19.annotation_for_ prematureRNA.gtf  --sjdbOverhang 150

Jan 17 17:13:50 ..... started STAR run

Jan 17 17:13:51 ... starting to generate Genome files

Jan 17 17:14:26 ..... processing annotations GTF

Jan 17 17:14:39 ... starting to sort Suffix Array. This may take a long time...

Jan 17 17:14:55 ... sorting Suffix Array chunks and saving them to disk...

1Jan 17 17:47:05 ... loading chunks from disk, packing SA...

Jan 17 17:48:19 ... finished generating suffix array

Jan 17 17:48:19 ... generating Suffix Array index

Jan 17 17:51:28 ... completed Suffix Array index

Jan 17 17:51:28 ... writing Genome to disk ...

Jan 17 17:51:31 ... writing Suffix Array to disk ...

Jan 17 17:51:58 ... writing SAindex to disk

Jan 17 17:52:00 ..... finished successfully

* Running CMD: /usr/local/src/STAR-Fusion/ctat-genome-lib-builder/util/gtf_to_gene_spans.pl /home/reference_genome/Star_fusion_lib/GRCh37_gencode_custum_for_prematureRNA_5_result/ref_annot.gtf > /home/reference_genome/Star_fusion_lib/GRCh37_gencode_custum_for_prematureRNA_5_result/ref_annot.gtf.gene_spans

* Running CMD: /usr/local/src/STAR-Fusion/ctat-genome-lib-builder/util/gtf_file_to_feature_seqs.pl --gtf_file /home/reference_genome/Star_fusion_lib/GRCh37_gencode_custum_for_prematureRNA_5_result/ref_annot.gtf --genome_fa /home/reference_genome/Star_fusion_lib/GRCh37_gencode_custum_for_prematureRNA_5_result/ref_genome.fa --seqType CDSplus > ref_annot.cdsplus.fa

* Running CMD: /usr/local/src/STAR-Fusion/ctat-genome-lib-builder/util/dfam_repeat_masker.pl --dfam_hmm homo_sapiens_dfam.hmm --target_fa ref_annot.cdsplus.fa --out_masked ref_annot.cdsplus.dfam_masked.fa --CPU 4

* Running CMD: dfamscan.pl -fastafile ref_annot.cdsplus.fa -hmmfile homo_sapiens_dfam.hmm -dfam_outfile __dfam_ref_annot.cdsplus.fa/dfam.out --masking_thresh --cpu 4

 

The dramscan.pl command does not finish in three days.

I started prep_genome_lib.pl with gencode.v19.annotation.gtf at the same time, but this one was giving successful results the next day.

 

I would appreciate any comments on how I should handle this.

 

All the best.

Kosuke

Brian Haas

unread,
Feb 13, 2023, 7:00:48 PM2/13/23
to STAR-Fusion
Hi,

I don't think STAR-Fusion in its current form would work well for pre-mRNA because of how it handles its repeat filtering.  It would require some substantial reengineering to make that work.

I'd say try Arriba and see if that reports the intronic breakpoints that you're interested in.  I'll put some though into what STAR-F might need to be more useful in this area.

best,

~b
Reply all
Reply to author
Forward
0 new messages