Genome indexing stucks at "loading chunks from disk, packing SA..."

382 views
Skip to first unread message

Tetyana

unread,
Jul 8, 2016, 12:36:20 PM7/8/16
to rna-star
Hi Alex,

I am indexing  with STAR v. 2.5.2a a 13.5 Gb genome consisting of 700000 scaffolds.
Independently off the number of CPUs and RAM available, STAR stucks at "loading chunks from disk, packing SA..."

After 10 days, I terminated the first run, which had 10 CPUs, and  started a new one with 24 CPUs on a machine that has 2 T memory.
 STAR is "loading chunks from disk, packing SA" already second day.

My command line:
/path/to/STAR-2.5.2a/bin/Linux_x86_64/STAR --runThreadN 28 --runMode genomeGenerate --genomeChrBinNbits 12 --limitGenomeGenerateRAM 500000000000 --genomeDir /path/to/indexes --genomeFastaFiles /path/to/genome.fa --sjdbGTFfile /path/to/genome.annotation.gtf --sjdbOverhang 124

The tail of my Log.out:
"localscratch/BOKU/wheat_indexes/SA_7 ; empty space on disk = 506301030400 bytes ...
Writing 7129671520 bytes into /localscratch/BOKU/wheat_indexes/SA_25 ; empty space on disk = 505674633216 bytes ...Writing 7408973336 bytes into /localscratch/BOKU/wheat_indexes/SA_0 ; empty space on disk = 494818041856 bytes ... done
 done
Writing 7145862664 bytes into /localscratch/BOKU/wheat_indexes/SA_15 ; empty space on disk = 478122205184 bytes ... done
 done
 done
Writing 7222180672 bytes into /localscratch/BOKU/wheat_indexes/SA_5 ; empty space on disk = 470490243072 bytes ... done
Writing 7431444288 bytes into /localscratch/BOKU/wheat_indexes/SA_22 ; empty space on disk = 463268061184 bytes ... done
Writing 7403442736 bytes into /localscratch/BOKU/wheat_indexes/SA_16 ; empty space on disk = 455836610560 bytes ...Writing 7498716064 bytes into /localscratch/BOKU/wheat_indexes/SA_2 ; empty space on disk = 450029514752 bytes ... done
 done
Writing 7248852832 bytes into /localscratch/BOKU/wheat_indexes/SA_9 ; empty space on disk = 440934440960 bytes ... done
Writing 7225747560 bytes into /localscratch/BOKU/wheat_indexes/SA_14 ; empty space on disk = 433685573632 bytes ... done
Writing 4151683360 bytes into /localscratch/BOKU/wheat_indexes/SA_28 ; empty space on disk = 426459820032 bytes ... done
Jul 07 16:27:02 ... loading chunks from disk, packing SA..."

Could you please help me resolving the problem?

Thank you!

Tetyana

Alexander Dobin

unread,
Jul 11, 2016, 6:13:29 PM7/11/16
to rna-star
Hi Tetyana,

with 2TB of RAM there should be enough RAM for this genome: ~120GB should be enough.
Please send me the Log.out file from one of the failed runs.

Cheers
Alex

Tetyana

unread,
Jul 12, 2016, 3:17:10 PM7/12/16
to rna-star
Hi Alex,

Thank you for your response.
The indexing completed successfully today. It took 98 h.

Best,
Tetyana

Alexander Dobin

unread,
Jul 12, 2016, 3:37:34 PM7/12/16
to rna-star
Hi Tetyana,

I am not sure why it took so long - if you send me the Log.out file, I will look into it.

Cheers
Alex

Tetyana

unread,
Jul 14, 2016, 7:15:19 AM7/14/16
to rna-star
Hi Alex,

My Log.out.gz is here:
https://drive.google.com/open?id=0B8axfQ9gUrL6cmJVTnlQVFpRd0U

Let me know if there is a way to make indexing faster.

Cheers,
Tetyana

Alexander Dobin

unread,
Jul 14, 2016, 12:34:33 PM7/14/16
to rna-star
Hi Tetyana,

the step that took ~4 days was loading the chunks of the genome into RAM from disk, ~100-200GB.
Usually, this step takes little time compared with other steps. Is it possible that your localscratch file system was busy at the time.
Is this genome publicly available? I will try it on my system.

Cheers
Alex

Tetyana

unread,
Jul 15, 2016, 9:58:47 AM7/15/16
to rna-star

Hi Alex,


Actually, STAR indexing was the only job running on that machine during those 4 days.

The access was blocked for all other users.

A large number of scaffolds (735933) could be the reason for the long loading.


Unfortunately, this genome has not been released yet. I will let you know, when it is available.

It may take several months.


Thank you,

Tetyana

Alexander Dobin

unread,
Jul 15, 2016, 3:05:24 PM7/15/16
to rna-star
Hi Tetyana,

I have tried to index a similar genome:
and it was completed in 3.5 hours (see below).
The "loading chunks from disk, packing SA..." step took only 30min, compared to 4 days for your run.
I used these parameters:
--runMode genomeGenerate --genomeDir ./ --genomeFastaFiles Triticum_aestivum.TGACv1.30.dna.toplevel.fa --limitGenomeGenerateRAM 120000000000 --runThreadN 6 --genomeChrBinNbits 12
Could you please try running it again, it seems to me it might have been a fluke in the system performance.
You only need to reserve ~130-150GB of RAM.

Cheers
Alex


Jul 14 18:09:20 ... sorting Suffix Array chunks and saving them to disk...
Jul 14 20:47:06 ... loading chunks from disk, packing SA...
Jul 14 21:14:57 ... finished generating suffix array
Jul 14 21:14:58 ... generating Suffix Array index
Jul 14 21:22:18 ... completed Suffix Array index
Jul 14 21:22:18 ... writing Genome to disk ...
Jul 14 21:22:50 ... writing Suffix Array to disk ...
Jul 14 21:27:01 ... writing SAindex to disk
Jul 14 21:27:11 ..... finished successfully

Tetyana

unread,
Jul 21, 2016, 3:38:45 AM7/21/16
to rna-star
Hi Alex,

I started re-running indexing the wheat genome yesterday at  14
PM.
It is "packing SA..." already 18h.
I terminate this run to let our sysadmin upgrading the system.

Thank you,
Tetyana
Reply all
Reply to author
Forward
0 new messages