problem indexing viral genome

122 views
Skip to first unread message

James Weger

unread,
Dec 7, 2017, 9:53:47 AM12/7/17
to rna-star
Hello all, 

I'm trying to index my viral genome in the same manner as recommended by STAR-Fusion. 

STAR --runThreadN 4 --runMode genomeGenerate --genomeDir ref_genome.fa.star.idx -- genomeFastaFiles ref_genome.fa --limitGenomeGenerateRAM 40419136213 --genomeChrBinNbits 16 --sjdbGTFfile ref_annot.gtf --sjdbOverhang 100

But it's giving me the following error :

Dec 07 03:37:05 ..... processing annotations GTF
terminate called after throwing an instance of 'std::out_of_range'
  what():  vector::_M_range_check
Aborted (core dumped)

I've been playing around with my fasta and GTF to make sure they agree with each other, to no avail. 

For reference, here are my gtf and fasta. I based the names of the chr on the small.fa example provided with star-fusion. 

gtf

chr1 SNPGenie gene 1 10806 . + . gene_id "polyprotein";
chr1 SNPGenie exon 106 10377 . + . gene_id "polyprotein";

fasta 

>chr1 1
GTTGTTGATCTGTGTGAATCAGACTGCGACAGTTCGAGTTTGAAGCGAGAGCTAACAACAGTATCAACAG
GTTTAATTTGGATTTGGAAACGAGAGTTTCTGGTCATGAAAAACCCAAAGAAGAAATCCGGAGGATTCCG
GATTGTCAATATGCTAAAACGCGGAGTAGCCCGTGTAAACCCCTTGGGAGGTTTGAAGAGGCTGCCAGCC
GGACTTCTGCTGGGTCATGGACCCATCAGAATGGTTTTGGCGATATTAGCCTTTTTGAGATTCACAGCAA
TCAAGCCATCACTGGGCCTCATCAACAGATGGGGT

Here is the exact command I'm running 

STAR --runThreadN 8 --runMode genomeGenerate --genomeDir ref_genome.fa.star.idx --genomeFastaFiles sequence.fasta --limitGenomeGenerateRAM 40419136213 --genomeChrBinNbits 16 --sjdbGTFfile sequence.gtf --sjdbOverhang 100 --genomeSAindexNbases 3

Any help you can provide would be greatly appreciated. 

James

Alexander Dobin

unread,
Dec 11, 2017, 4:28:21 PM12/11/17
to rna-star
Hi James,

the GTF files should contains "transcript_id" attributes for the field3=exon lines. Since you probably have only one transcript per gene, you can make them equal to the gene_id, e.g.
chr1 SNPGenie exon 106 10377 . + . gene_id "polyprotein"; transcript_id "polyprotein"

Cheers
Alex

James Weger

unread,
Dec 13, 2017, 2:59:35 PM12/13/17
to rna-star
Alex, 

Thanks so much. That appeared to solve the indexing issue. Now I'm trying to run STAR-fusion and having more problems. 

Here is what I'm getting 

(py2.7) weger@IDANGS01:/data/ebel_lab/Weger/star_fusion_test$ STAR-Fusion --genome_lib_dir /data/ebel_lab/Weger/star_fusion_test/ --left_fq /data/ebel_lab/Weger/star_fusion_test/ZIKV-stock_S11_trim.fastq --output_dir /data/ebel_lab/Weger/star_fusion_test
* Running CMD: STAR --genomeDir /data/ebel_lab/Weger/star_fusion_test//ref_genome.fa.star.idx  --readFilesIn /data/ebel_lab/Weger/star_fusion_test/ZIKV-stock_S11_trim.fastq   --outReadsUnmapped None  --chimSegmentMin 12  --chimJunctionOverhangMin 12  --alignSJDBoverhangMin 10  --alignMatesGapMax 100000  --alignIntronMax 100000  --chimSegmentReadGapMax 3  --alignSJstitchMismatchNmax 5 -1 5 5  --runThreadN 4 --limitBAMsortRAM 31532137230  --outSAMstrandField intronMotif  --outSAMtype BAM SortedByCoordinate  --twopassMode Basic
Dec 13 12:26:49 ..... started STAR run
Dec 13 12:26:49 ..... loading genome
Dec 13 12:26:49 ..... started 1st pass mapping
Dec 13 12:28:31 ..... finished 1st pass mapping
Dec 13 12:28:31 ..... inserting junctions into the genome indices
Dec 13 12:28:31 ..... started mapping
Dec 13 12:30:17 ..... started sorting BAM
Dec 13 12:30:17 ..... finished successfully
* Running CMD: /home/weger/miniconda3/envs/py2.7/lib/STAR-Fusion/util/STAR-Fusion.predict  -J Chimeric.out.junction  --genome_lib_dir /data/ebel_lab/Weger/star_fusion_test/  --min_junction_reads 1  --min_sum_frags 2  --min_novel_junction_support 3  -O /data/ebel_lab/Weger/star_fusion_test/star-fusion.preliminary/star-fusion
CMD: mkdir -p /data/ebel_lab/Weger/star_fusion_test/star-fusion.preliminary/star-fusion.predict.intermediates_dir
-parsing GTF file: /data/ebel_lab/Weger/star_fusion_test//ref_annot.gtf
Use of uninitialized value $strand in string eq at /home/weger/miniconda3/envs/py2.7/lib/STAR-Fusion/util/../PerlLib/GTF_utils.pm line 92, <$fh> line 1.
Use of uninitialized value $annot in pattern match (m//) at /home/weger/miniconda3/envs/py2.7/lib/STAR-Fusion/util/../PerlLib/GTF_utils.pm line 94, <$fh> line 1.
Use of uninitialized value $annot in concatenation (.) or string at /home/weger/miniconda3/envs/py2.7/lib/STAR-Fusion/util/../PerlLib/GTF_utils.pm line 94, <$fh> line 1.
Error, cannot get gene_id from  of line
chr1 SNPGenie exon 106 10377 . + . gene_id "polyprotein"; transcript_id "polyprotein" at /home/weger/miniconda3/envs/py2.7/lib/STAR-Fusion/util/../PerlLib/GTF_utils.pm line 94, <$fh> line 1.
        GTF_utils::GTF_to_gene_objs("/data/ebel_lab/Weger/star_fusion_test//ref_annot.gtf") called at /home/weger/miniconda3/envs/py2.7/lib/STAR-Fusion/util/../PerlLib/GTF_utils.pm line 30
        GTF_utils::index_GTF_gene_objs_from_GTF("/data/ebel_lab/Weger/star_fusion_test//ref_annot.gtf", HASH(0x2133a50)) called at /home/weger/miniconda3/envs/py2.7/lib/STAR-Fusion/util/../PerlLib/GTF_utils.pm line 20
        GTF_utils::index_GTF_gene_objs("/data/ebel_lab/Weger/star_fusion_test//ref_annot.gtf", HASH(0x2133a50)) called at /home/weger/miniconda3/envs/py2.7/lib/STAR-Fusion/util/STAR-Fusion.predict line 404
        main::parse_GTF_features("/data/ebel_lab/Weger/star_fusion_test//ref_annot.gtf", HASH(0x2aeb978), HASH(0x2aeb9a8)) called at /home/weger/miniconda3/envs/py2.7/lib/STAR-Fusion/util/STAR-Fusion.predict line 112
Error, cmd: /home/weger/miniconda3/envs/py2.7/lib/STAR-Fusion/util/STAR-Fusion.predict  -J Chimeric.out.junction  --genome_lib_dir /data/ebel_lab/Weger/star_fusion_test/  --min_junction_reads 1  --min_sum_frags 2  --min_novel_junction_support 3  -O /data/ebel_lab/Weger/star_fusion_test/star-fusion.preliminary/star-fusion  died with ret 6400 at /home/weger/miniconda3/envs/py2.7/lib/STAR-Fusion/PerlLib/Pipeliner.pm line 79.
        Pipeliner::run(Pipeliner=HASH(0x1844ae0)) called at /home/weger/miniconda3/envs/py2.7/lib/STAR-Fusion/STAR-Fusion line 395

Do you see any glaring issues? Thanks again for your help. 

James

Alexander Dobin

unread,
Dec 14, 2017, 12:38:10 PM12/14/17
to rna-star
Hi James,

this seems to be issue with STAR-Fusion code, so I would refer you to the STAR-Fusion group:
Brian Haas will be more qualified to help you.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages