Cannot generate human genome index!!

2,098 views
Skip to first unread message

Stephen Williams

unread,
Jan 17, 2015, 10:58:02 PM1/17/15
to rna-...@googlegroups.com
I am trying to generate a human genome index from the .FASTQ file Homo_sapiens.GRCh38.dna.primary_assembly.fa (ENSEMBL) with the following command.  I'm using AWS m3.2xlarge which has 8 cores and 30GB RAM.  I'm also using the latest version of STAR and have tried both /bin/Linux_x86_64 and /bin/Linux_x86_64_static

STAR  --runMode genomeGenerate --runThreadN 8 --genomeDir /home/genome/ --genomeFastaFiles Homo_sapiens.GRCh38.dna.primary_assembly.fa --limitGenomeGenerateRAM 30000000000


I get the following error:

Jan 18 01:57:30 ..... Started STAR run
Jan 18 01:57:30 ... Starting to generate Genome files
Jan 18 01:58:39 ... starting to sort  Suffix Array. This may take a long time...
Jan 18 01:58:59 ... sorting Suffix Array chunks and saving them to disk...
Jan 18 02:18:29 ... loading chunks from disk, packing SA...
Jan 18 02:24:54 ... writing Suffix Array to disk ...
Jan 18 02:29:06 ... Finished generating suffix array
Jan 18 02:29:06 ... starting to generate Suffix Array index...
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)



Any and all help would be greatly appreciated.  From what I've read this seems like a possible computing power issue but I believe I have all the requirements for STAR

Also, is there a human genome index that has already been generated for STAR that I can simply download if I can't get this to work?

Thanks

Stephen 



Log.out

Alexander Dobin

unread,
Jan 20, 2015, 6:48:35 PM1/20/15
to rna-...@googlegroups.com
Hi Stephen,

it looks like an issue with RAM, although 30GB should be enough for human genome. Please check that there are no other processes taking significant amount of RAM.
What does 'free -m' command tell you?
You can download that GRC h38 STAR genome here:
Alex

Stephen Williams

unread,
Jan 20, 2015, 7:33:35 PM1/20/15
to rna-...@googlegroups.com
Thanks for the reply Alex.

Here is what I got for 'free -m'

bioinfo@$ free -m
                                  total       used       free     shared    buffers     cached
Mem:                          30159     1172      28987          0          9        901
-/+ buffers/cache:        260      29899
Swap:            0          0          0

It looks like there is a small amount of memory that is being used by another process.  I ran again with --limitGenomeGenerateRAM 2800000000 and got the same result.  I'm very new to STAR, are there any other parameters that I can set to reduce the amount of RAM consumed?

Thanks again,

Stephen   

Stephen Williams

unread,
Jan 20, 2015, 8:40:13 PM1/20/15
to rna-...@googlegroups.com
UPDATE*

Hi Alex,
I upgraded to a instance with 60GB of RAM and it worked just fine.  Thanks for the clarification!

Stephen 

On Tuesday, January 20, 2015 at 6:48:35 PM UTC-5, Alexander Dobin wrote:

pbpa...@gmail.com

unread,
Feb 21, 2018, 2:13:46 PM2/21/18
to rna-...@googlegroups.com
Hi,

I am also trying to generate human genome index. The link for the pre-built index is not working!!
What do you mean by upgrading to 60GB RAM..because I have 128Gb RAM and am getting the same error that you mentioned in your initial post.

EXITING because of FATAL PARAMETER ERROR: limitGenomeGenerateRAM=112000000000is too small for your genome
SOLUTION: please specify --limitGenomeGenerateRAM not less than 116041738282 and make that much RAM available

I did "free -m" and am getting:

             total       used       free     shared    buffers     cached
Mem:        120868      37377      83491          0          0      35835
-/+ buffers/cache:       1540     119327
Swap:       122867         40     122827


What did you do?

Attached is my log file. Any suggestions will be really helpful!!


Thanks,
Payal
Log.out

pbpa...@gmail.com

unread,
Feb 22, 2018, 12:47:55 PM2/22/18
to rna-star
HI,

Just an update, its running now after I added --genomeSAsparseD --genomeSAindexNbases --genomeChrBinNbits apart from increasing the --limitGenomeGenerateRAM, but now its crashing because it can't open GTF file. (Error at the end of the Log.out file)
FATAL error, could not open file sjdbGTFfile=home/pbanerjee/Human_STAR_Genome/Homo_sapiens.GRCh38.91_ERCC92.gtf

STAR --runThreadN 8 --runMode genomeGenerate --limitGenomeGenerateRAM=115970609877 --genomeSAsparseD 3 --genomeSAindexNbases 12 --genomeChrBinNbits=16 --genomeDir /path/ --genomeFastaFiles /path/Homo_sapiens.GRCh38.dna.toplevel_ERCC92.fa --sjdbGTFfile Homo_sapiens.GRCh38.91_ERCC92.gtf

So I am trying to follow the conversation https://github.com/alexdobin/STAR/issues/292, but even after increasing --limitGenomeGenerateRAM=119000000000 it crashed..Please let me know if you know where I am going wrong!!

Log file attached!


Thanks,
Payal

 
On Saturday, January 17, 2015 at 10:58:02 PM UTC-5, Stephen Williams wrote:
Log.out

Alexander Dobin

unread,
Feb 23, 2018, 2:41:27 PM2/23/18
to rna-star
Hi Payal,

I think you are missing "/"  --sjdbGTFfile /home/pbanerjee/Human_STAR_Genome/Homo_sapiens.GRCh38.91_ERCC92.gtf
so it cannot find the file.
Also, you are using the "toplevel" assembly with all the patches and alternative loci that increase the genome size to >>3GB.
I strongly recommend using the "primary_assembly" file such as:
Then you can use the default --genomeSAsparseD 1

Cheers
Alex

pbpa...@gmail.com

unread,
Feb 28, 2018, 1:51:13 PM2/28/18
to rna-star
Thanks Alex, its working now.

STAR --runThreadN 8 --runMode genomeGenerate --limitGenomeGenerateRAM=119000000000 --genomeSAsparseD 3 --genomeSAindexNbases 12 --genomeChrBinNbits=16 --genomeDir GenomeIndex --genomeFastaFiles Genome_ERCC.fa --sjdbGTFfile Genome_ERCC92.gtf

Payal
Reply all
Reply to author
Forward
0 new messages