Star running problem with --sjdbOverhang

1,167 views
Skip to first unread message

Amol Ghodke

unread,
May 14, 2015, 9:17:12 AM5/14/15
to rna-...@googlegroups.com
Hi all, 

I was running Star using following command -

STAR --runThreadN 8 --sjdbOverhang 245 --runMode genomeGenerate --genomeDir . --genomeFastaFiles $GENOME --sjdbFileChrStartEnd gene_set.introns  2> star_build.log 

STAR --runThreadN 8 --readFilesIn R1.fq R2.fq --outFilterMatchNminOverLread 0.4 --outFilterScoreMinOverLread 0.2  --outReadsUnmapped Fastx --limitBAMsortRAM 8729257684 --genomeDir . --outSAMtype BAM SortedByCoordinate --outSAMunmapped Within --outSAMstrandField intronMotif --outFilterIntronMotifs RemoveNoncanonical --outSAMattrRGline ID:"$SPECIES" 2> star.log


it was running well until i updated my STAR. After updating it is giving me following error.

EXITING because of fatal PARAMETERS error: present --sjdbOverhang=100 is not equal to the value at the genome generation step =245

i dont know where is the problem since --sjdbOverhang value was same previously 
can somebody help me?

Thank you in advance 
Amol

Alexander Dobin

unread,
May 15, 2015, 4:54:16 PM5/15/15
to rna-...@googlegroups.com, amo...@gmail.com
Hi Amol,

this is a problem I introduced in 2.4.1 when I switched to default value of --sjdbOverhang = 100, and allowed this to be specified at the mapping stage for the on the fly junction insertion.
It will be fixed in a patch in the next few days. In the meantime, please specify --sjdbOverhang 245 at the mapping stage.

Cheers
Alex

mbritton

unread,
Feb 2, 2016, 8:46:18 PM2/2/16
to rna-star, amo...@gmail.com
Hi Alex, et al,

Was this problem fixed, or does it require special handling during installation or genome indexing?
We just installed star 2.5.1b, and I created a new genome index including "--sjdbOverhang 99" (for PE100 reads). 

I'm comparing aligning the PE100 reads (after trimming), with merging the R1/R2, to generate longer single-end reads (up to 190 bp), so tried to include "--sjdbOverhang 189" in the mapping command, but continue to get the error:

EXITING because of fatal PARAMETERS error: present --sjdbOverhang=189 is not equal to the value at the genome generation step =99

Do I need to index the genome with some "NULL" value for --sjdbOverhang which will then allow it to be changed at the mapping step?

Thanks in advance,

Monica Britton

Alexander Dobin

unread,
Feb 5, 2016, 3:25:42 PM2/5/16
to rna-star, amo...@gmail.com
Hi Monica,

if you generated a genome with --sjdbOverhang 99, you cannot specify a different value at the mapping stage, the algorithm can only work with one value.
If you want to use two different values, you would need to generate genome without annotations, and then use --sjdbOverhang 99 at the mapping stage for 100b reads, and --sjdbOverhang 189 for 190b reads.

However, the 99 value will work well for 190 reads, you will see very minute changes with --sjdbOverhang 189

Cheers
Alex

Gary Ho

unread,
Nov 14, 2016, 11:32:38 AM11/14/16
to rna-star, amo...@gmail.com
Hi Alex,
The sjdbOverhang not equal issue also happened to me. I installed 2.5.2b, sjdbOverhang value was not specified in the genome generation step. Instead I just tried to involve this value on-the-fly (--sjdbOverhang 75), meaning during mapping step. It turns out that the error message comes up, saying that the value 75 is not equal to the default setting 100. Is this a bug? Or I did't do it in a proper way.

Best!
Gary  

在 2016年2月6日星期六 UTC+8上午4:25:42,Alexander Dobin写道:

Alexander Dobin

unread,
Nov 17, 2016, 1:07:11 PM11/17/16
to rna-star, amo...@gmail.com
Hi Gary,

if you generate the genome with GTF file, and do not specify the value for --sjdbOverhang, it will be set to the default 100.
If you want to be able to set arbitrary value of --sjdbOverhang on the fly, you have to generate the genome without annotations (GTF) - then you supply both the --sjdbOverhang and GTF file at the mapping step.

Cheers
Alex

Steve Stelman

unread,
Apr 20, 2017, 3:18:40 PM4/20/17
to rna-star
Hi Alex,

Between this thread and the other at https://groups.google.com/forum/#!msg/rna-star/h9oh10UlvhI/BfSPGivUHmsJ, I am still wondering about best practices.

In our case, we are trying to develop a generic pipeline where the input may be 75bp, 100bp, or 150bp (and maybe other readlengths). It seems that for most cases, the default value for --sjdbOverhang is sufficient, but I wonder why is it not a best practice to simply build the genome indices without the GTF file, and just use --sjdbGTFfile and --sjdbOverhang during the mapping step for on-the-fly application so that --sjdbOverhang will always be exactly optimal for the data set. While there is some additional CPU time required for this method, in my tests with hg19, it only added a couple minutes to the mapping. Are there any reasons why on-the-fly junction insertion is not a good idea?

(If it's not a good idea, I suppose alternatively we could prebuild the genome indices with several different --sjdbOverhang values and choose the correct one during mapping.)

I appreciate your thoughts,

Steve

Alexander Dobin

unread,
Apr 20, 2017, 3:29:49 PM4/20/17
to rna-star
Hi Steve,

the only disadvantage of adding the junctions on-the-fly is extra time spent in each run (a few minutes), more RAM (<1GB) used for --sjdbOverhang >100, and a bit more RAM ( <1GB) used temporarily. The latter can be reduced by setting --limitSjdbInsertNsj to the number of junctions from the genome. However, the increase in sensitivity for 75-150b reads is very small, so I am not sure it's worth the inconvenience.

Cheers
Alex

Tom Snir

unread,
Aug 22, 2017, 12:03:39 PM8/22/17
to rna-star
Hello,

I would like to clarify something related to the above conversation. I am running STAR 2.5.3a. My initial STAR run (I believe this is the indexing stage) was with --sjdbOverhang 100, and I am later running another tool called zUMIs which calls STAR. zUMIs asks for reads length, which in my case is 50 BPs. I assume zUMIs passed this on to STAR, which resulted in this error:

EXITING because of fatal PARAMETERS error: present --sjdbOverhang=49 is not equal to the value at the genome generation step =100

If I understand correctly, my options right now are:

- running STAR indexing again using --sjdbOverhang 49 and then trying zUMIs with 50 BP as I did before
- running STAR indexing without a GTF file at all, and this would prevent the error?

part of my problem is that I am not sure if zUMIs would be OK with me giving a larger number, lets say 100 as BP lengh, as it might use this later for something else.

Thank you,

Tom Snir.

Alexander Dobin

unread,
Aug 23, 2017, 6:32:35 PM8/23/17
to rna-star
Hi Tom,

I think the best option in your case is 

- running STAR indexing again using --sjdbOverhang 49 and then trying zUMIs with 50 BP as I did before

Do you know which command zUMIs uses to call STAR? If it has both --sjdbGTFfile ... and --sjdbOverhang..., and you supply the GTF file, than the GTF file will be used on the fly and you can run STAR indexing without the GTF.

Cheers
Alex

Tom Snir

unread,
Aug 24, 2017, 1:06:43 AM8/24/17
to Alexander Dobin, rna-star
Hi Alex,

Thanks for the quick reply. I do not know how zUMIs calls STAR, but I know it is possible to use it without a STAR genome index (it is stated on their wiki). The only requirements are the STAR genome directory and a GTF file.

In either case, I re-ran STAR indexing using --sjdbOverhang 49 and used zUMIs again and this time everything went fine, so this can be considered solved. One final question though, can I expect better results with either of the options I initially mentioned (re-indexing with 49 overhang or indexing without a GTF file and letting STAR use the GTF on the fly).

Thank you,

Tom.



--
You received this message because you are subscribed to a topic in the Google Groups "rna-star" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rna-star/q4zGzlPgwXY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rna-star+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/rna-star.

Swati

unread,
Aug 25, 2017, 5:18:50 AM8/25/17
to rna-star, ado...@gmail.com
Hi Alex and Tom,

This is Swati (a developer of zUMIs - https://github.com/sdparekh/zUMIs ). zUMIs is a pipeline to analyse RNA-seq data compatible with most of the UMI based RNA-seq protocol.

Sorry, I saw this post a bit late but it would be good to clarify zUMIs usage of STAR.
  1. zUMIs suggests user to generate STAR genome index without a specific overhang and GTF file. 
  2. It takes STAR genome directory, read length and GTF file as an input and adds the juctions on the fly using "readlength-1" overhang (As explained in zUMIs wiki page: https://github.com/sdparekh/zUMIs/wiki/Usage).
  3. It is also flexible with specific STAR parameters and accepts them with -x. e.g.  -x "--genomeFastaFiles spikes.fasta".
It is good to know that Tom's problem is solved but as a future reference it would be better to generate STAR index without --sjdbOverhang and --sjdbGTFfile.

Thanks Alex for STAR and your quick response to all the issues.

Best
Swati
To unsubscribe from this group and all its topics, send an email to rna-star+u...@googlegroups.com.

Alexander Dobin

unread,
Aug 25, 2017, 12:35:35 PM8/25/17
to rna-star, ado...@gmail.com
Hi Swati,

thanks a lot for the explanations!

Cheers
Alex

Alexander Dobin

unread,
Aug 25, 2017, 12:37:11 PM8/25/17
to rna-star, ado...@gmail.com
Hi Tom,

whether the GTF file is included in the index generation step, or on-the-fly, the results will be exactly the same.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages