STAR generating genomes (2nd pass) is very slow

ERA

unread,

Apr 5, 2017, 9:48:22 AM4/5/17

to rna-star

Hi Alex,

Would you help me to define values of STAR parameters to make genome generation faster? STAR takes long time to generate the final genomes (2^nd pass) at the “sorting Suffix Array chunks and saving them to disk” step.

Apr 04 21:01:09 ..... started STAR run

Apr 04 21:01:09 ... starting to generate Genome files

Apr 04 21:11:35 ... starting to sort Suffix Array. This may take a long time...

Apr 04 21:14:48 ... sorting Suffix Array chunks and saving them to disk...

Files generated by STAR at this step are

-rw-r--r-- 1 era 427 Apr 4 21:09 chrLength.txt

-rw-r--r-- 1 era 937 Apr 4 21:09 chrNameLength.txt

-rw-r--r-- 1 era 510 Apr 4 21:09 chrName.txt

-rw-r--r-- 1 era 488 Apr 4 21:09 chrStart.txt

-rw-r--r-- 1 era 957 Apr 4 21:01 genomeParameters.txt

-rw-r--r-- 1 era 8321997920 Apr 4 22:29 SA_0

-rw-r--r-- 1 era 8529424824 Apr 4 22:34 SA_1

-rw-r--r-- 1 era 7996888776 Apr 4 23:44 SA_12

-rw-r--r-- 1 era 7653443560 Apr 4 23:48 SA_13

-rw-r--r-- 1 era 8171180896 Apr 4 23:54 SA_14

-rw-r--r-- 1 era 8094993408 Apr 4 23:51 SA_15

-rw-r--r-- 1 era 8234286752 Apr 4 22:30 SA_2

-rw-r--r-- 1 era 8397152848 Apr 4 22:33 SA_3

My code was

$STAR --runThreadN 12 --limitGenomeGenerateRAM 200G --runMode genomeGenerate --genomeDir $indexGenome --genomeFastaFiles $fastaGenome --sjdbFileChrStartEnd $novelSpliceSites --sjdbOverhang 100 --genomeChrBinNbits 12

I had the same issue when I used the value of 10, 12, 14 and 18 for the --genomeChrBinNbits parameter.

The Log.out is attached here.

Thanks,

ERA

Log.out

Alexander Dobin

unread,

Apr 6, 2017, 10:41:45 AM4/6/17

to rna-star

Hi ERA,

did the genome generation finish? How long did it take?

Since you are generating the genome from scratch (and inserting 1st pass junctions), and your genome is large (~14GB), it may take several hours to generate the index.

If you are collecting the junctions from all samples, then this 2nd pass genome will be generated only once.

If you need to speed it up, you can try to do the junction insertion on-the-fly. This should be much faster than re-generating the whole genome.

STAR <mapping job parameters> --genomeDir /path/to/1st/pass/genome/ --sjdbInsertSave All --sjdbFileChrStartEnd $novelSpliceSites --sjdbOverhang 100 --readFilesIn read1 read2

This will insert the novel junctions into to "old" 1st pass genome, and map the reads.

This can be done for each mapping run, however, it can also saves the resulting genome and re-use it for future mapping runs.

The new genome directory is inside _STARgenome in the run directory. For the next mapping runs you can simply specify it as --genomeDir, and omit "--genomeDir /path/to/1st/pass/genome/ --sjdbInsertSave All --sjdbFileChrStartEnd $novelSpliceSites --sjdbOverhang 100" .

Cheers

Alex

ERA

unread,

Apr 7, 2017, 2:18:58 PM4/7/17

to rna-...@googlegroups.com

Hi Alex,

No it did not even after three days.

The way to re-generate the genome by doing the junction insertion on-the-fly seems to work even if I had this error message below. Am I right? Would you do tell me the meaning of this error message and a solution to solve it? I attach here the Log.out file.

Starting to map file # 0 …

EXITING because of FATAL ERROR: phtread error while creating thread # 4, error code: 11

Apr 07 13:44:16 ...... FATAL ERROR, exiting

Thanks,

ERA

LL2017_219_allGenGener_Log.out

ERA

unread,

Apr 10, 2017, 11:10:31 AM4/10/17

to rna-star

Hi Alex,

Please close this ticket.

Thanks,

ERA

Reply all

Reply to author

Forward