"terminate called after throwing an instance of 'std::out_of_range'" when indexing a Genome

493 views
Skip to first unread message

rubi

unread,
Sep 26, 2015, 7:03:28 PM9/26/15
to rna-star
Hi,

I'm trying to index a genome (more accurately a chromosome) + GTF file and getting this error:
terminate called after throwing an instance of 'std::out_of_range'
  what():  vector::_M_range_check

Here's the command I'm running (using STAR_2.4.2a):
STAR --runMode genomeGenerate --genomeDir <genomeDir> --genomeFastaFiles chr19.fa --runThreadN 16 --sjdbGTFfile chr19.gtf --sjdbGTFtagExonParentTranscript Parent --sjdbOverhang 74


where chr19.fa is the mm10 chr19 and chr19.gtf is the GTF file and is attached.

I'm allocating 16 cores and 80GB of RAM.

Help would be greatly appreciated


chr19.gtf

Kirill Tsyganov

unread,
Sep 27, 2015, 6:09:17 PM9/27/15
to rubi, rna-star
Hi Rubi,

I think your problem is that you are not accounting for "small genome". Since your are only using one chromosome as your reference, you need to specify different `--genomeSAindexNbases` number. Here is copy from the STAR manual, page 6

2.2.5

Very small genome.

For small genomes, the parameter --genomeSAindexNbases needs to be scaled down, with a typical
value of min(14, log2(GenomeLength)/2 - 1). For example, for 1 megaBase genome, this is equal
to 9, for 100 kiloBase genome, this is equal to 7.


In my mm10 reference chromosome 19 length is 61431566 and using formula above I get value ~11.9. So I would try setting `--genomeSAindexNbases` to anywhere between 11 and 12. 

Cheers
 

--
You received this message because you are subscribed to the Google Groups "rna-star" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rna-star+u...@googlegroups.com.
Visit this group at http://groups.google.com/group/rna-star.

rubi

unread,
Sep 27, 2015, 9:14:11 PM9/27/15
to rna-star, nimrod.r...@gmail.com
Thanks for the suggestion Kirill, but I think it has to do with the GTF file I'm using (actually I need to index the entire mm10 genome but only uploaded the chr19 GTF due to size).

In the log file everything seems to proceed fine until the the processing of the GTF file where I get an error for every GTF line. For example:
Sep 27 21:00:18 ..... Processing annotations GTF
WARNING: while processing sjdbGTFfile=/net/dulacfs2/dulacfs2/Users/dfernand/de/refdata/star.1.ucsc/chr19.gtf: no transcript_id for line:
chr19 mm10_knownGene exon 3066556 3066627 0 + . gene_id "uc033hid.1"; transcript_id "uc033hid.1"; gene_name "uc033hid.1";
WARNING: while processing sjdbGTFfile=/net/dulacfs2/dulacfs2/Users/dfernand/de/refdata/star.1.ucsc/chr19.gtf: no transcript_id for line:
chr19 mm10_knownGene exon 3576263 3576335 0 + . gene_id "uc033hif.1"; transcript_id "uc033hif.1"; gene_name "uc033hif.1";
 

This GTF was generated by UCSC but the same error also happens for genocde M6 GTF.

Kirill Tsyganov

unread,
Sep 27, 2015, 9:55:59 PM9/27/15
to rubi, rna-star
This strange WARNING since I can see `transcript_id` in that line... 

if this Is the actual command you are running then I think problem might be in `--sjdbGTFtagExonParentTranscript` 

STAR --runMode genomeGenerate --genomeDir <genomeDir> --genomeFastaFiles chr19.fa --runThreadN 16 --sjdbGTFfile chr19.gtf --sjdbGTFtagExonParentTranscript Parent --sjdbOverhang 74

Are you giving `Parent` value to the `--sjdbGTFtagExonParentTranscript` ? I don't think you need to do that and problem might be because you are doing that..

Are you getting the same error if you don't specify `--sjdbGTFtagExonParentTranscript` at all..?

Cheers, 

Kirill

rubi

unread,
Sep 27, 2015, 11:50:13 PM9/27/15
to rna-star, nimrod.r...@gmail.com
Thanks!
Reply all
Reply to author
Forward
0 new messages