STAR generating genomes (2nd pass) is very slow

159 views
Skip to first unread message

ERA

unread,
Apr 5, 2017, 9:48:22 AM4/5/17
to rna-star

Hi Alex,

Would you help me to define values of STAR parameters to make genome generation faster?  STAR takes long time to generate the final genomes (2nd pass) at the “sorting Suffix Array chunks and saving them to disk” step.  

 

Apr 04 21:01:09 ..... started STAR run

Apr 04 21:01:09 ... starting to generate Genome files

Apr 04 21:11:35 ... starting to sort Suffix Array. This may take a long time...

Apr 04 21:14:48 ... sorting Suffix Array chunks and saving them to disk...

 

Files generated by STAR at this step are

-rw-r--r-- 1 era        427 Apr  4 21:09 chrLength.txt

-rw-r--r-- 1 era        937 Apr  4 21:09 chrNameLength.txt

-rw-r--r-- 1 era        510 Apr  4 21:09 chrName.txt

-rw-r--r-- 1 era        488 Apr  4 21:09 chrStart.txt

-rw-r--r-- 1 era        957 Apr  4 21:01 genomeParameters.txt

-rw-r--r-- 1 era 8321997920 Apr  4 22:29 SA_0

-rw-r--r-- 1 era 8529424824 Apr  4 22:34 SA_1

-rw-r--r-- 1 era 7996888776 Apr  4 23:44 SA_12

-rw-r--r-- 1 era 7653443560 Apr  4 23:48 SA_13

-rw-r--r-- 1 era 8171180896 Apr  4 23:54 SA_14

-rw-r--r-- 1 era 8094993408 Apr  4 23:51 SA_15

-rw-r--r-- 1 era 8234286752 Apr  4 22:30 SA_2

-rw-r--r-- 1 era 8397152848 Apr  4 22:33 SA_3

 

My code was

$STAR --runThreadN 12 --limitGenomeGenerateRAM 200G --runMode genomeGenerate --genomeDir $indexGenome --genomeFastaFiles $fastaGenome --sjdbFileChrStartEnd $novelSpliceSites --sjdbOverhang 100 --genomeChrBinNbits 12

I had the same issue when I used the value of 10, 12, 14 and 18 for the --genomeChrBinNbits parameter.

 

The Log.out is attached here.


Thanks,

ERA


Log.out

Alexander Dobin

unread,
Apr 6, 2017, 10:41:45 AM4/6/17
to rna-star
Hi ERA,

did the genome generation finish? How long did it take?

Since you are generating the genome from scratch (and inserting 1st pass junctions), and your genome is large (~14GB), it may take several hours to generate the index.
If you are collecting the junctions from all samples, then this 2nd pass genome will be generated only once.

If you need to speed it up, you can try to do the junction insertion on-the-fly. This should be much faster than re-generating the whole genome.

STAR <mapping job parameters> --genomeDir /path/to/1st/pass/genome/ --sjdbInsertSave All --sjdbFileChrStartEnd $novelSpliceSites --sjdbOverhang 100 --readFilesIn read1 read2

This will insert the novel junctions into to "old" 1st pass genome, and map the reads.
This can be done for each mapping run, however, it can also saves the resulting genome and re-use it for future mapping runs.
The new genome directory is inside _STARgenome in the run directory. For the next mapping runs you can simply specify it as --genomeDir, and omit "--genomeDir /path/to/1st/pass/genome/ --sjdbInsertSave All --sjdbFileChrStartEnd $novelSpliceSites --sjdbOverhang 100" .

Cheers
Alex

ERA

unread,
Apr 7, 2017, 2:18:58 PM4/7/17
to rna-...@googlegroups.com

Hi Alex,

No it did not even after three days.

The way to re-generate the genome by doing the junction insertion on-the-fly seems to work even if I had this error message below. Am I right? Would you do tell me the meaning of this error message and a solution to solve it? I attach here the Log.out file.

Starting to map file # 0 …

EXITING because of FATAL ERROR: phtread error while creating thread # 4, error code: 11

Apr 07 13:44:16 ...... FATAL ERROR, exiting

Thanks,

ERA

LL2017_219_allGenGener_Log.out

ERA

unread,
Apr 10, 2017, 11:10:31 AM4/10/17
to rna-star
Hi Alex,
Please close this ticket. 
Thanks,
ERA
Reply all
Reply to author
Forward
0 new messages