STAR_2.5.1a :genome index taking long time

72 views
Skip to first unread message
Assigned to ado...@gmail.com by mv...@nyu.edu

Vinu

unread,
May 11, 2016, 12:58:42 PM5/11/16
to rna-star

Hi,

the genome index of my genome is taking more than 15 hours(still running) with version : STAR_2.5.1a and the command is as below.


Command used :

STAR --runMode genomeGenerate --genomeDir /path/to/STAR_GenomeDIR --genomeFastaFiles /path/to/Genome.fa --sjdbGTFfile /path/to/Genome.gtf --sjdbOverhang 100 --limitGenomeGenerateRAM 156067603157 --runThreadN 20

Where as when I used older version of STAR(STAR_2.4.2a)  with same genome build it took only < 2hrs  and the command is as same as above except for --runThreadN which was not specified.


Please Calrify me in couple of fronts 

1) Do let me know if there is any mistake in the command above
2) Should I opt for starlong? or when shoudl one use starlong 

P.S: For both the STAR version mentioned above  I am using precompiled static version.

-Thanks and regards
MV

Alexander Dobin

unread,
May 11, 2016, 1:17:38 PM5/11/16
to rna-star
Hi Vinu,

please send me Log.out file of this run. You are allowing 150GB of RAM - do you have that much available on your machine?

Cheers
Alex

Vinu

unread,
May 12, 2016, 2:09:24 AM5/12/16
to rna-star
Hi,

I got 256 Gb of ram. The genome index got finsihed in 18 hrs but when I am trying to align the reads its failling. I am attaching both the logfile.
Log.out
Log.out.genomBuild

Alexander Dobin

unread,
May 16, 2016, 4:55:57 PM5/16/16
to rna-star
Hi Vinu,

your mapping job does not work because you have used the same fasta file at the mapping stage as you used at the genome generation step, so STAR tries to add the same sequences to already existing sequences, which causes memory overflow. If you omit --genomeFastaFiles /opt/share/DATA/Hybrid_Genome/HybridGenome.fa at the mapping stage, it will work.

A couple of more recommendations:
1. Your genome seems to contain a lot of PATCH and HAPlotype references, I generally recommend against including them as they unnecessarily increase the genome size and result in higher multimapping rate.
2. At lease some portions of the "gtf" file look more like GFF, with "Parent" attribute rather than "transcript_id" and "gene_id" . I recommend converting the GFF format into GTF for better compatibility.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages