Job stopped when sorting Suffix Array chunks and saving them to disk...

1,027 views
Skip to first unread message

hex...@gmail.com

unread,
Mar 9, 2015, 10:37:18 AM3/9/15
to rna-...@googlegroups.com
Hello,

I run the 2-pass STAR and used the following command: 

qsub -P diag -pe make 4 -q highmem.q -l mem_free=160G -V -b y /diag/home/atl2014/ap/STAR-STAR_2.4.0f1/source/STAR --runMode genomeGenerate --genomeDir /diag/home/atl2014/SNPanalysis/2pass_LaBT --genomeFastaFiles /diag/home/atl2014/SNPanalysis/GenomeSequence/Atlantic_salmon.fa --sjdbFileChrStartEnd /diag/home/atl2014/SNPanalysis/1PASS_LaBT/SJ.out.tab --sjdbOverhang 75 --runThreadN 4 --outFileNamePrefix /diag/home/atl2014/SNPanalysis/2pass_LaBT/2pass_LaBT --limitGenomeGenerateRAM 660982717496


Mar 07 23:36:04 ..... Started STAR run
Mar 07 23:36:04 ... Starting to generate Genome files
Mar 08 00:03:13 ... finished processing splice junctions database ...
Mar 08 01:08:11 ... starting to sort  Suffix Array. This may take a long time...
Mar 08 01:26:15 ... sorting Suffix Array chunks and saving them to disk...

Then, it stopped running.

Does anybody know how to fix the problem? Can I run the same command without deleting the output files? (Will the STAR start from where it stopped?)

Thanks,

Xiaoping


Alexander Dobin

unread,
Mar 12, 2015, 1:02:52 PM3/12/15
to rna-...@googlegroups.com
Hi Xiaoping,

your STAR parameters look fine, except for --limitGenomeGenerateRAM 660982717496, which (i) contains a dash instead of space (ii) the number is 660,982,717,496~660GB which is bigger than your mem_free request, and you probably do not need that much anyway - what is the genome size? If this does not help, please send me the Log.out file.
STAR will not re-use the old file, it's best to delete them before a new run.

Cheers
Alex

hex...@gmail.com

unread,
Mar 12, 2015, 4:39:28 PM3/12/15
to rna-...@googlegroups.com
Hi Alex,

Thank you very much for your reply.

As to the command, it is Strikethrough rather than a dash. It should not affect the command as it does not appear when I paste the command into SSH. As to --limitGenomeGenerateRAM, when I run without this parameter, it reported an error and suggested me to increase the default to a certain number. I increased that number suggested a little bit.

The genome draft of the species I study is about 3GB. The Log.out file is very large. Here are the last paragraphs from one of the failed Log.out files:

/diag/home/atl2014/SNPanalysis/GenomeSequence/Atlantic_salmon.fa : chr # 944542  "gb|AGKD03944543.1|" chrStart: 247810490368
/diag/home/atl2014/SNPanalysis/GenomeSequence/Atlantic_salmon.fa : chr # 944543  "gb|AGKD03944544.1|" chrStart: 247810752512
/diag/home/atl2014/SNPanalysis/GenomeSequence/Atlantic_salmon.fa : chr # 944544  "gb|AGKD03944545.1|" chrStart: 247811014656
/diag/home/atl2014/SNPanalysis/GenomeSequence/Atlantic_salmon.fa : chr # 944545  "gb|AGKD03944546.1|" chrStart: 247811276800
/diag/home/atl2014/SNPanalysis/GenomeSequence/Atlantic_salmon.fa : chr # 944546  "gb|AGKD03944547.1|" chrStart: 247811538944
/diag/home/atl2014/SNPanalysis/GenomeSequence/Atlantic_salmon.fa : chr # 944547  "gb|AGKD03944548.1|" chrStart: 247811801088
Mar 08 15:40:39 ... finished processing splice junctions database ...
Writing genome to disk...Writing 247864744565 bytes into /diag/home/atl2014/SNPanalysis/2pass_LaBT/Genome ; empty space on disk = 2886055493632 bytes ... done
 done.
Number of SA indices: 6199208696
SA size in bytes: 30221142394
Mar 08 16:46:32 ... starting to sort  Suffix Array. This may take a long time...
Number of chunks: 4;   chunks size limit: 16531223184 bytes
Mar 08 17:04:28 ... sorting Suffix Array chunks and saving them to disk...
Writing 480102248 bytes into /diag/home/atl2014/SNPanalysis/2pass_LaBT/SA_3 ; empty space on disk = 2349025198080 bytes ... done
 

I have run 2-pass for a few samples. The 1st and 3rd sample run successfully. The 2nd sample failed once, and it succeeded when I run again without deleting the output files. I tried my 4th sample a few times but always failed.

How many CPUs and GB of RAM required for running 2-pass? Are 160 GB ram and 4 (or 2) CPUs enough for STAR?

Thank you very much.

Xiaoping

Alexander Dobin

unread,
Mar 15, 2015, 4:19:30 PM3/15/15
to rna-...@googlegroups.com
Hi Xiaoping,

your genome draft has a very large number of contigs, that's why the memory requirements are so huge with the default parameters. Please try --genomeChrBinNbits 14 (or you may even go down to 12). This should reduce the RAM requirements significantly, to <50GB, i.e. you should be able to use --limitGenomeGenerateRAM 50000000000

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages