Indexing genome reference with STAR

Erena Edae

unread,

Nov 28, 2017, 1:11:06 PM11/28/17

to rna-star

Hi all,

I was trying to index a reference genome sequence on remote server. I am keep getting the following error. Any help is appreciated.

terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
/var/spool/torque/mom_priv/jobs/4556453.mesabim3.msi.umn.edu.SC: line 12: 3111 Aborted /home/rousem/edaex002/STAR-2.5.3a/bin/Linux_x86_64/STAR --runThreadN 16 --runMode genomeGenerate --genomeDir wheatv1_pseudomolecules_parts_index --genomeFastaFiles /home/rousem/edaex002/IWGSC_RefSeqv1.0/Wheat_IWGSC_WGA_v1.0_pseudomolecules/161010_Chinese_Spring_v1.0_pseudomolecules_parts.fasta --sjdbGTFtagExonParentTranscript /home/rousem/edaex002/IWGSC_RefSeqv1.0/iwgsc_refseqv1.0_HighConf_2017Mar13.gff3 --limitGenomeGenerateRAM 38805727275 --genomeLoad LoadAndRemove --genomeChrBinNbits 12

Alexander Dobin

unread,

Nov 28, 2017, 6:54:14 PM11/28/17

to rna-star

Hi Erena,

this - most likely - means that you do not have enough RAM.

Please send me the Log.out file of this run.

Cheers

Alex

Message has been deleted

Erena Edae

unread,

Dec 1, 2017, 4:38:41 PM12/1/17

to rna-star

On Friday, December 1, 2017 at 10:43:31 AM UTC-6, Erena Edae wrote:

Hi Alex,
Thank you for response. The following is job status during termination, and I do not have lof file.
PBS Job Id: 4566045.mesabim3.msi.umn.edu
Job Name: Index
Exec host: cn0135/0-23
Execution terminated
Exit_status=134
resources_used.cput=28:05:59
resources_used.vmem=73321308kb
resources_used.walltime=01:39:
01
resources_used.mem=62914560kb
resources_used.energy_used=0
req_information.task_count.0=1
req_information.lprocs.0=24
req_information.memory.0=62914560kb
req_information.thread_usage_policy.0=allowthreads
req_information.hostlist.0=cn0135:ppn=24
req_information.task_usage.0.task.0={"task":{"cpu_list":"0-5,12-17,6-11,18-23","mem_list":"0-1","cores":0,"threads":24,"host":"cn0135"}}

Thanks,
Erena.

Alexander Dobin

unread,

Dec 3, 2017, 3:20:46 PM12/3/17

to rna-star

Hi Erena

inside the directory you are running STAR from, you should see the Log.out file.

In the script that you submit to the queue, before running STAR, you can use

cd /path/to/my/run/dir/

You can make this directory the same as you use in the --genomeDir, then the Log.out file will be stored together with all the genome files.

This file contains all the important metrics about the run - I need to see it to diagnose the problem.

Cheers

Alex

Erena Edae

unread,

Dec 6, 2017, 7:57:35 PM12/6/17

to rna-star

Hi Alex,

I think it worked now with the 1T RAM. Take a look the following from the end of Log.out file.
Thank you for help.
Erena.

Writing 1896886800 bytes into /home/rousem/edaex002/STAR-2.5.3a/wheatv1_pseudomolecules_index/SA_24 ; empty space on disk = 3521596998041600 bytes ... done
Dec 06 14:36:16 ... loading chunks from disk, packing SA...
Dec 06 14:47:27 ... finished generating suffix array
Dec 06 14:47:27 ... generating Suffix Array index
Dec 06 14:56:40 ... completed Suffix Array index
Dec 06 14:56:40 ... writing Genome to disk ...
Writing 14550302720 bytes into /home/rousem/edaex002/STAR-2.5.3a/wheatv1_pseudomolecules_index/Genome ; empty space on disk = 3521198504607744 bytes ... done
SA size in bytes: 124876315264
Dec 06 14:57:23 ... writing Suffix Array to disk ...
Writing 124876315264 bytes into /home/rousem/edaex002/STAR-2.5.3a/wheatv1_pseudomolecules_index/SA ; empty space on disk = 3521162138107904 bytes ... done
Dec 06 15:04:53 ... writing SAindex to disk
Writing 8 bytes into /home/rousem/edaex002/STAR-2.5.3a/wheatv1_pseudomolecules_index/SAindex ; empty space on disk = 3520754257412096 bytes ... done
Writing 120 bytes into /home/rousem/edaex002/STAR-2.5.3a/wheatv1_pseudomolecules_index/SAindex ; empty space on disk = 3520754257412096 bytes ... done
Writing 1655351975 bytes into /home/rousem/edaex002/STAR-2.5.3a/wheatv1_pseudomolecules_index/SAindex ; empty space on disk = 3520754257412096 bytes ... done
Dec 06 15:05:00 ..... finished successfully
DONE: Genome generation, EXITING

On Sunday, December 3, 2017 at 2:20:46 PM UTC-6, Alexander Dobin wrote:

Hi Erena

inside the directory you are running STAR from, you should see the Log.out file.
In the script that you submit to the queue, before running STAR, you can use
cd /path/to/my/run/dir/

You can make this directory the same as you use in the --genomthink iteDir, then the Log.out file will be stored together with all the genome files.

Alexander Dobin

unread,

Dec 11, 2017, 4:25:46 PM12/11/17

to rna-star

Hi Erena,

1TB seems to be too much. The SA file size is only ~14GB. What is the size of the Genome file in the genome directory?

How many chromosomes/scaffolds are in the assembly? If there are more than ~1000, I would recommend reducing --genomeChrBinNbits, which should scale like log2(GenomeSize/NumberOfReferences).

Cheers

Alex

edae...@umn.edu

unread,

Dec 17, 2017, 1:50:12 PM12/17/17

to rna-star

Hi Alex,

The size of the genome file is 16G and there are 21 chromosomes. I think it is successfully indexed. However, the mapping did not work with the following script

--runThreadN 24 --readFilesIn ~/umbellulata_RP_Trinity.fa ~/umbellulata_SP_Trinity.fa --genomeDir /home/rousem/edaex002/STAR-2.5.3a/wheatv1_pseudomolecules_index --limitGenomeGenerateRAM 990000000000

In the above I was trying to RNA-seq assemblies from different two samples to the reference genome instead of short reads.

Thanks,
Erena.

Alexander Dobin

unread,

Dec 19, 2017, 10:58:13 AM12/19/17

to rna-star

Hi Erena,

please send me the Log.out file of this run.

Are you trying to map long assembled transcripts from Trinity? If your reads are longer than ~300b, you would need to use STARlong.