Running out of memory when sorting .bam file

3,352 views
Skip to first unread message

Mike

unread,
Dec 3, 2014, 12:45:25 PM12/3/14
to rna-...@googlegroups.com
I'm running out of memory when sorting the .bam file on a system with 18GB of RAM, I'm using --runThreadN 8:

EXITING because of fatal ERROR: not enough memory for BAM sorting:
SOLUTION: re-run STAR with at least --limitBAMsortRAM 51166447416

I generated my genome index using --genomeSAsparseD 2 so the genome is ~16GB.  When I don't set --limitBAMsortRAM (ie. using default of 0) the STAR manual indicates that sorting should limit RAM to the genome size but apparently it's still using more than 16GB.  When I set --limitBAMsortRAM 15000000000 (assuming --limitBAMsortRAM wants a number of bytes) to limit it to 15GB I also ran out of memory.

Alexander Dobin

unread,
Dec 4, 2014, 11:51:01 PM12/4/14
to rna-...@googlegroups.com
Hi Mike,

unfortunately, I think you need ~50GB of RAM to sort this file with the current STAR algorithm, so you would have to resort to unsorted output and then sort with samtools.
The problem is that STAR at the moment STAR splits the read into equal size genomic buckets, and the largest of your buckets contains 51GB of alignments.
This may happen if you are mapping a very large number of reads, or if you have a very highly expressed locus.
I am planning to introduce adaptive bucket size for sorting in the future.

Cheers
Alex

Mike

unread,
Dec 4, 2014, 11:57:16 PM12/4/14
to rna-...@googlegroups.com
>or if you have a very highly expressed locus.

I do in fact have a very highly expressed locus so that could be it, thank you.

swaraj basu

unread,
Jun 7, 2016, 12:23:30 PM6/7/16
to rna-star
I faced the same issue when mapping reads to a small genome, like Human Mitochondria. hence probably Alex might want to sort this issue.

Alexander Dobin

unread,
Jun 7, 2016, 5:30:37 PM6/7/16
to rna-star
Hi Swaraj,

have you tried the latest version? If so, could you please send me the Log.out file of this run.

Cheers
Alex

swaraj basu

unread,
Jun 8, 2016, 5:37:01 AM6/8/16
to rna-star
Here is the log Alex.
ZZZLog.out

Alexander Dobin

unread,
Jun 10, 2016, 12:34:35 PM6/10/16
to rna-star
Hi Swaraj,

thanks for the file. What is the total number of reads that you are sorting?
This looks like a bug, I will look into it next week.

Cheers
Alex

Daniel Gerlach

unread,
Dec 22, 2016, 4:40:58 AM12/22/16
to rna-star
Dear Alex,

This looks like an older thread, but I am seeing the same problem for some of my samples. I am using the Linux64 pre-compiled STAR binary version 2.5.2b on AWS EC2 machines using DNAnexus (mem3_hdd2_x4). Over 1000 samples ran through without an issue including CCLE cell lines (which contained paired-end FASTQ files with over 2x10G). Now on GTEx samples, about 10% fail, even though the paired-end FASTQ files are only about 2x2.7G in size. The error message:

Dec 20 20:25:37 ..... started sorting BAMDec 20, 2016 9:25 PM
EXITING because of fatal ERROR: not enough memory for BAM sorting: 
SOLUTION: re-run STAR with at least --limitBAMsortRAM 32372931538
Dec 20 20:25:37 ...... FATAL ERROR, exiting

I will try do increase the memory, I was just wondering why I have issues with GTEx samples which are much smaller in size compared to many CCLE samples for which I never encountered this error message?

Best, Daniel

Alexander Dobin

unread,
Dec 22, 2016, 12:31:24 PM12/22/16
to rna-star
Hi Daniel,

could you please send me the Log.out file?
One possibility is that the failed samples have a few very highly expressed loci (e.g. rRNA or chrM). Sorting puts the alignments into separate bins, and one of the bins might be overflowing.
Do you know if the reads in these files have been sorted by alignment coordinate? This may happen if the fastq files were recovered from BAM files.

Cheers
Alex

Kyle Chang

unread,
Jan 5, 2017, 1:13:49 AM1/5/17
to rna-star
Hi Alex, I'm experiencing the same memory error when sorting BAM file. It comes from a TCGA CRC sample (78f35f00-e807-43dc-8fe4-98bcee7fd6ae_gdc_realn_rehead.bam).  I used picard to extract pair-end reads and align them with Star 2pass to hg19. 

Attached is out Log.out file.
TCGA-A6-2678-11A-01R-A32Z-07Log.out

Alexander Dobin

unread,
Jan 6, 2017, 4:45:37 PM1/6/17
to rna-star
Hi Kyle,

I suspect that the BAM file from which the reads were extracted was sorted by coordinate. 
This causes the problem for STAR sorting, since STAR uses the first 100,000 reads to define the sorting bins, expecting the reads to come from random positions on the genome. 
Ideally, to solve this problem, you would need to randomize read order before mapping - I think you can use samtools bamshuf command will do it for the BAM file before Picard converts it to fastqs.

Alternatively, if you have enough RAM, you can try to increase the sorting memory, for this particular sample --limitBAMsortRAM  33000000000 .
In principle, you can set it to your EAM amount minus a few GB.

Cheers
Alex

Arjun Rao

unread,
Mar 17, 2017, 10:40:10 AM3/17/17
to rna-star
Hi Alex,

Sorry if I'm restarting an old thread but I don't want a to start a new thread for this. I'm running into the same problems as the people on this thread.

I'm running star as a component of a completely automated pipeline and most of the input parameters are calculated on-the-fly. It looks like the --limitBAMsortRAM parameter is not easily calculable from the input fastq size and/or star index size. I'm thinking of changing my pipeline to emit an Unsorted STAR bam and then sort it with the tried-and-tested `samtools sort`. Is there any drawback to doing this?

Thanks in advanc,
Arjun

Alexander Dobin

unread,
Mar 17, 2017, 11:08:14 AM3/17/17
to rna-star
Hi Arjun,

the only drawback to sorting by samtools is that it's likely to be slower.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages