the --sjdbOverhang parameter in generating index

546 views
Skip to first unread message

Iris Zhu

unread,
Jan 24, 2018, 2:23:34 PM1/24/18
to rna-star
Hi, I am new to STAR. I am trying to build hg19 index of STAR. There is a parameter --sjdbOverhang, which is supposed to  Readlength-1. My question is:  Do I need to re-build the index each time for sequence reads of a different length?
For example, this time my reads are 100 nt so I use --sjdbOverhang 99, next time my reads are 75 nt then I need to build a different index with --sjdbOverhang 74?
Thank you for your help,

Iris

Iris Zhu

unread,
Jan 26, 2018, 11:07:29 AM1/26/18
to rna-star
Could anybody answer my question? I would really appreciate it. 
I have multiple RNA-seq data sets to analyze, they are of different read lengths, 65, 75, 100, 125 nt. To build 4 different index is not a big deal but I just want to make sure that is what's needed, if I want to use STAR.
Thank you,
Iris 

Alexander Dobin

unread,
Jan 26, 2018, 12:53:32 PM1/26/18
to rna-star
Hi Iris,

you can simply use the default value (100) for all of your cases. This parameter does make almost no difference for reads longer than 50b.

Cheers
Alex

Sarah Young

unread,
Mar 9, 2018, 3:45:54 PM3/9/18
to rna-star
Hi Alex,

We have the same issue, except that we need a reference we can use on 25bp and lengths up to 50bp and beyond.   What do you recommend in that case?  Should we use multiple index builds?

Thanks for your help!

~Sarah

Alexander Dobin

unread,
Mar 9, 2018, 4:17:43 PM3/9/18
to rna-star
Hi Sarah,

for 25b reads, I would recommend generating a separate index with --sjdbOverhang 24.
For reads longer than 50b, you can use --sjdbOverhang 100

Cheers
Alex

Jean Chang

unread,
Mar 20, 2018, 9:57:05 AM3/20/18
to rna-star
I have an hg19 index built with --sjdbOverhang 24 but recently received 38bp reads. Will I get significantly better alignments if I re-build my index with --sjdbOverhang 37 than if I continue to use my existing index? This thread indicates it should not matter much for reads >50bp but is the impact significant enough for my shorter reads that I should generate new indices for different read lengths if under 50bp?

Thanks,

Jean

Alexander Dobin

unread,
Mar 20, 2018, 11:41:44 AM3/20/18
to rna-star
Hi Jean,

in your case, I would recommend generating the new index. In general, the indexes for longer reads will work fine for shorter reads, but not vice versa.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages