Building human genome index memory errors.

1,028 views
Skip to first unread message

Roy Francis

unread,
Jul 21, 2018, 2:38:44 PM7/21/18
to rna-...@googlegroups.com
I am using fasta and gtf files from ftp://ftp.ensembl.org/pub/release-93/fasta/homo_sapiens/dna/ and using STAR version 2.5.2b. I have 20 cores and a total of 128 GB RAM.

I tried to first build the index as below:

star \
--runMode genomeGenerate \
--runThreadN 20 \
--genomeFastaFiles "$path_hs_genome" \
--sjdbGTFfile "$path_hs_gtf" \
--genomeDir "$path_starindex_hs"


It fails with this error:

Jul 20 12:25:18 ..... started STAR run
Jul 20 12:25:18 ... starting to generate Genome files
Jul 20 12:38:05 ... starting to sort Suffix Array. This may take a long time...
Jul 20 12:40:10 ... sorting Suffix Array chunks and saving them to disk...
slurmstepd: error: Job 3952147 exceeded memory limit (131270204 > 131072000), being killed
slurmstepd: error: Exceeded job memory limit
slurmstepd: error: *** JOB 3952147 ON r22 CANCELLED AT 2018-07-20T12:51:46 ***

I am not sure why STAR doesn't detect the available memory and stay within it. Anyway, I decided to add an extra parameter --limitGenomeGenerateRAM:

star \
--runMode genomeGenerate \
--runThreadN 20 \
--limitGenomeGenerateRAM 12800000000 \
--genomeFastaFiles "$path_hs_genome" \
--sjdbGTFfile "$path_hs_gtf" \
--genomeDir "$path_starindex_hs"


Then I get this error:

Jul 20 14:47:11 ..... started STAR run
Jul 20 14:47:11 ... starting to generate Genome files


EXITING because of FATAL PARAMETER ERROR
: limitGenomeGenerateRAM=128000000is too small for your genome
SOLUTION
: please specify limitGenomeGenerateRAM not less than152003700778 and make that much RAM available


Jul 20 14:56:22 ...... FATAL ERROR, exiting



I set --genomeSAsparseD 2 and still failed. I tried 3 and 4 and still fails.

Then, I started fiddling with more options. Tried --genomeSAindexNbases 12. Same error.

Eventually I have ended up at this

star \
--runMode genomeGenerate \
--runThreadN 18 \
--limitGenomeGenerateRAM 12800000000 \
--sjdbOverhang 149 \
--genomeSAsparseD 6 \
--genomeSAindexNbases 10 \
--genomeChrBinNbits 16 \
--genomeFastaFiles "$path_hs_genome" \
--sjdbGTFfile "$path_hs_gtf" \
--genomeDir "$path_starindex_hs"


I still get the same error:

Jul 21 20:12:38 ..... started STAR run
Jul 21 20:12:38 ... starting to generate Genome files


EXITING because of FATAL PARAMETER ERROR
: limitGenomeGenerateRAM=12800000000is too small for your genome
SOLUTION
: please specify limitGenomeGenerateRAM not less than151837151957 and make that much RAM available


Jul 21 20:20:39 ...... FATAL ERROR, exiting
End of Script. Script took 500 seconds.


I am pretty sure I've read somewhere that the index can be created using min 30GB RAM. I have 128 and that doesn't seem to be enough. Any suggestions on what might be wrong?
I have also attached the Log.out file from the last run (script above).
Thanks,
Roy


Log.out

Alexander Dobin

unread,
Jul 24, 2018, 3:15:09 PM7/24/18
to rna-star
Hi Roy,

you are probably using the "toplevel" FASTA file which contains patches and alt contigs that makes the genome too big.

I would also recommend that you consider using GENCODE instead of ENSEMBL - it's very similar but a bit more user-friendly.
You would need to use their primart ("PRI") files as well.

Cheers
Alex

Roy Francis

unread,
Jul 25, 2018, 6:02:58 AM7/25/18
to rna-...@googlegroups.com
Ah yes! It works now. The ensembl toplevel file is 54GB (uncompressed) compared to gencode primary fasta which is 3GB (uncompressed). Massive difference!
Thanks again,
Roy 

Reply all
Reply to author
Forward
0 new messages