Hi all,
Now I would like to perform STAR-Fusion to detect gene fusion using reference of mature RNA as well as that of “pre-mature” RNA.
So, I tried to build reference for “pre-mature” RNA.
First, creating a GTF file where all transcript annotations were replaced with exons and everything else was removed.
awk 'BEGIN{FS="\t"; OFS="\t"} $3 == "transcript"{ $3="exon"; print}' \
$HOME/reference_genome/Star_fusion_lib/GRCh37_gencode_v19_CTAT_lib_Mar012021.source/gencode.v19.annotation.gtf > $HOME/reference_genome/Star_fusion_lib/GRCh37_gencode_prematureRNA_6/gencode.v19.annotation_for_prematureRNA.gtf
Second, we ran this command:
singularity exec -e ${command}/starfusion-latest.simg \
/usr/local/src/STAR-Fusion/ctat-genome-lib-builder/prep_genome_lib.pl \
--genome_fa GRCh37.p13.genome.primary.fa \
--gtf gencode.v19.annotation_for_prematureRNA.gtf \
--fusion_annot_lib fusion_lib.Mar2021.dat.gz \
--pfam_db current \
--dfam_db human \
--output_dir ${output}
* Running CMD: STAR --runThreadN 4 --runMode genomeGenerate --genomeDir /home/reference_genome/Star_fusion_lib/GRCh37_gencode_custum_for_prematureRNA_5_result/ref_genome.fa.star.idx --genomeFastaFiles /home/reference_genome/Star_fusion_lib/GRCh37_gencode_custum_for_prematureRNA_5/GRCh37.p13.genome.primary.fa --limitGenomeGenerateRAM 40419136213 --genomeChrBinNbits 16 --sjdbGTFfile /home/reference_genome/Star_fusion_lib/GRCh37_gencode_custum_for_prematureRNA_5/gencode.v19.annotation_for_ prematureRNA.gtf --sjdbOverhang 150
Jan 17 17:13:50 ..... started STAR run
Jan 17 17:13:51 ... starting to generate Genome files
Jan 17 17:14:26 ..... processing annotations GTF
Jan 17 17:14:39 ... starting to sort Suffix Array. This may take a long time...
Jan 17 17:14:55 ... sorting Suffix Array chunks and saving them to disk...
1Jan 17 17:47:05 ... loading chunks from disk, packing SA...
Jan 17 17:48:19 ... finished generating suffix array
Jan 17 17:48:19 ... generating Suffix Array index
Jan 17 17:51:28 ... completed Suffix Array index
Jan 17 17:51:28 ... writing Genome to disk ...
Jan 17 17:51:31 ... writing Suffix Array to disk ...
Jan 17 17:51:58 ... writing SAindex to disk
Jan 17 17:52:00 ..... finished successfully
* Running CMD: /usr/local/src/STAR-Fusion/ctat-genome-lib-builder/util/gtf_to_gene_spans.pl /home/reference_genome/Star_fusion_lib/GRCh37_gencode_custum_for_prematureRNA_5_result/ref_annot.gtf > /home/reference_genome/Star_fusion_lib/GRCh37_gencode_custum_for_prematureRNA_5_result/ref_annot.gtf.gene_spans
* Running CMD: /usr/local/src/STAR-Fusion/ctat-genome-lib-builder/util/gtf_file_to_feature_seqs.pl --gtf_file /home/reference_genome/Star_fusion_lib/GRCh37_gencode_custum_for_prematureRNA_5_result/ref_annot.gtf --genome_fa /home/reference_genome/Star_fusion_lib/GRCh37_gencode_custum_for_prematureRNA_5_result/ref_genome.fa --seqType CDSplus > ref_annot.cdsplus.fa
* Running CMD: /usr/local/src/STAR-Fusion/ctat-genome-lib-builder/util/dfam_repeat_masker.pl --dfam_hmm homo_sapiens_dfam.hmm --target_fa ref_annot.cdsplus.fa --out_masked ref_annot.cdsplus.dfam_masked.fa --CPU 4
* Running CMD: dfamscan.pl -fastafile ref_annot.cdsplus.fa -hmmfile homo_sapiens_dfam.hmm -dfam_outfile __dfam_ref_annot.cdsplus.fa/dfam.out --masking_thresh --cpu 4
The dramscan.pl command does not finish in three days.
I started prep_genome_lib.pl with gencode.v19.annotation.gtf at the same time, but this one was giving successful results the next day.
I would appreciate any comments on how I should handle this.
All the best.
Kosuke