'std::bad_alloc' error during mapping job

Brett Vanderwerff

unread,

May 28, 2018, 2:56:33 PM5/28/18

to rna-star

Hello,

I’m mapping RNA-Seq reads to the human genome and getting the following error after the “inserting junctions into the genome indices” step:

terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc

I’m on a desktop computer with 1 processor and 32 GB of RAM. I’m operating this computer from a bootable 16GB flash drive running ubuntu 18.04 (is this an issue?). I’m reading all the files for this analysis off a 1TB hard drive external HD and then writing any output to the 1TB external HD (>230GB free space on the external HD). I know my setup is a bit goofy, but I'm doing this on university computers that limit customized setups.

I’m not sure if there is something I am missing, if this is a bug, or if I need to go to a cloud service to get enough RAM to do this. I’m new to RNA-seq and apologize if I am missing something obvious. I googled around, but could not seem to find an answer to this problem.

Some details:

I’m not generating the genome index, rather I got all the pre-made files from:

http://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/STARgenomes/ENSEMBL/homo_sapiens/ENSEMBL.homo_sapiens.release-83/

along with the primary assembly:

(http://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/STARgenomes/ENSEMBL/homo_sapiens/ENSEMBL.homo_sapiens.release-83/Homo_sapiens.GRCh38.dna.primary_assembly.fa)

and the annotation file:

(http://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/STARgenomes/ENSEMBL/homo_sapiens/ENSEMBL.homo_sapiens.release-83/Homo_sapiens.GRCh38.83.gtf)

I’m using the pre-compiled executable STAR-2.6.0a with the following settings:

--genomeDir (points to folder with content from from first labshare link above minus the primary assembly fa file and gtf annotation file) \

--sjdbGTFfile (points to Homo_sapiens.GRCh38.83.gtf from labshare link) \

--runThreadN 1 \

--outSAMstrandField intronMotif \

--outFilterIntronMotifs RemoveNoncanonical \

--outFileNamePrefix (just set to meaningful name) \

--readFilesIn (points to single end read zipped fastq file) \

--readFilesCommand zcat \

--outSAMtype BAM Unsorted \

--outReadsUnmapped Fastx \

--outSAMmode Full

I am getting a log out file for my zipped fastq input file (http://textuploader.com/dp16x, also attached)

I also pasted the log out file that came with the genome index material I downloaded from the labshare link (http://textuploader.com/dp16b, also attached)

If anyone has any ideas, please let me know. I’d really like to use STAR, the documentation for it seems excellent, but I am hitting a bit of a wall on this one.

Sincerely,

Brett Vanderwerff

SRR3194428Log.out

Log.out

Brett Vanderwerff

unread,

May 29, 2018, 10:21:06 PM5/29/18

to rna-star

I have been looking at this some more. I think I am going to try and generate my own genome index files using --genomeSAsparseD value of 2 and see if that solves the error. I originally used the publicly available ones from http://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/STARgenomes/ENSEMBL/homo_sapiens/ENSEMBL.homo_sapiens.release-83/ because I was having issues generating my own index files (I thought probably memory issues because I read index generation is more memory intensive than mapping). I'm going to try some different configurations when generating the genome to see if I can get it to go. I'll report back if this works.

Alexander Dobin

unread,

May 30, 2018, 6:54:26 PM5/30/18

to rna-star

Hi Brett,

--genomeSAsparseD 2 will definitely solve the problem, however, even without it, you should be able to fit into 32GB of RAM.

Please try to add --limitSjdbInsertNsj as the problem happens at the junctions insertion step.

Also, what's the output of

$ free -g

Cheers

Alex

Brett Vanderwerff

unread,

Jun 1, 2018, 12:37:10 PM6/1/18

to rna-...@googlegroups.com

Alex,

Thanks for the reply. Your activity at this group makes it a treasure trove for troubleshooting and I am grateful you took the time to respond to my post.

I ran the following:

--runThreadN 1 \
        --runMode genomeGenerate \
        --genomeDir (points to empty genome directory) \
        --genomeFastaFiles (points to primary assembly) \
        --sjdbGTFfile (points to annotation file) \
        --sjdbOverhang 100
        --genomeSAsparseD 2

I also tried:

--runThreadN 1 \
        --runMode genomeGenerate \
        --genomeDir (points to empty genome directory) \
        --genomeFastaFiles (points to primary assembly) \
        --sjdbGTFfile (points to annotation file) \
        --sjdbOverhang 100
        --genomeSAsparseD 2

--limitSjdbInsertNsj 300000

--limitGenomeGenerateRAM 25000000000

Based on a post I read here: https://groups.google.com/forum/#!searchin/rna-star/limitSjdbInsertNsj$2016gb$20%7Csort:date/rna-star/q3CZKHf9LOc/YnVhhhE6CwAJ

In both cases it hangs at "sorting suffix array chunks and saving them to disk" for about 16 hours on the same 32GB desktop I described earlier. I have not tried running it longer (Should it take that long?). The only activity that seems to happen is that in both cases it generates the following files but nothing else as far as I can see:

chrLength.txt

chrName.txt

chrNameLength.txt

chrStart.txt

I tried to generate the genome starting both with EMSEMBL releases 75 and 83 from http://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/STARgenomes/ENSEMBL/homo_sapiens/. I also tried generating the genome from a manual download of the primary assembly and annotation of release 75 directly from ENSEMBL just in case something was corrupted on the labshare link.

I attached screenshots of free -m both at resting state and under the load of trying to generate the genome. I also attached a screenshot of the terminal output after typing free -g.

The only other thing I can think of is to try generating the genome with just the sequence of one chromosome to see if something smaller goes through. I would be happy to try any suggestions you have.

Sincerely,

Brett Vanderwerff

memory at rest.png

memory under load.png

g flag.png

Alexander Dobin

unread,

Jun 1, 2018, 5:07:44 PM6/1/18

to rna-star

Hi Brett,

a couple of points that are separate from each other.

1. I realized that I overlooked a problematic parameter in your original e-mail. When you run mapping with the genome that you downloaded, you should not use --sjdbGTFfile option - the annotations are already included in the genome index, and including them again will waste RAM.

Also, if you use more than one thread, I would recommend reducing the per-thread buffer size --limitIObufferSize to 80000000 .

2. For the genome generation, it seems that you have 30000MB of free RAM which should be definitely enough for the genome with --genomeSAsparseD 2 (under load you have ~10000MB free). So it does not look like a memory problem. Please try to run it with more threads, e.g. --runThreadN 6. I recall there were some reports of spurious problems when running just one thread. If it does not work, please send me the Log.out file.

Cheers

Alex

Brett Vanderwerff

unread,

Jun 1, 2018, 8:34:50 PM6/1/18

to rna-star

Dear Alex,

Again, thank you very much for your suggestions. I will experiment with these options and post what works/does not work so that people searching for solutions to similar obstacles can see the outcome.

Sincerely,

Brett Vanderwerff

unread,

Jun 4, 2018, 5:14:09 PM6/4/18

to rna-...@googlegroups.com

I anyone if looking at this for troubleshooting I ran the following to generate the genome and it worked on my 32gb 1 core desktop:

--runThreadN 6 \

        --runMode genomeGenerate \
        --genomeDir (points to empty genome directory) \
        --genomeFastaFiles (points to primary assembly) \
        --sjdbGTFfile (points to annotation file) \
        --sjdbOverhang 100
        --genomeSAsparseD 2

--limitIObufferSize 80000000

It would hang with these settings:

--runThreadN 1 \
        --runMode genomeGenerate \
        --genomeDir (points to empty genome directory) \
        --genomeFastaFiles (points to primary assembly) \
        --sjdbGTFfile (points to annotation file) \
        --sjdbOverhang 100
        --genomeSAsparseD 2

I am still familiarizing myself with the mapping step, I will post back with any configurations that I find to work on my setup.

Reply all

Reply to author

Forward