Samples with variable read lengths and genome index.

Skip to first unread message

Joshua Mincer

May 17, 2023, 3:42:32 PMMay 17
to rna-star
I'm trying to put together a pipeline to preprocess fastq reads and align to the genome using STAR. For this, I need to generate the genome index. 

The samples that I have are variable read lengths after preprocessing, ranging from 50bp in some samples and 150bp in other samples. 

The manual states that you should use a STAR genome for max(readLength)-1. Should I then be generating a genome for each of these samples? Or will using one genome with sjdbOverhang of 149 for all samples be sufficient? Thanks for any help!

Joshua Mincer

May 17, 2023, 6:06:33 PMMay 17
to rna-star
In addition, if I do decide to use a STAR genome for each of these, would it be safe to use an sjdbOverhang of max(readLength) - 1 BEFORE preprocessing? For example, if all my reads start with 151bp, but through end trimming, the max size across samples ranges from 140, 142, 144, 145, and 150, is using the same index using an overhang of 151-1 still usable for all of the samples? 

You received this message because you are subscribed to the Google Groups "rna-star" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To view this discussion on the web visit

Alexander Dobin

May 19, 2023, 7:31:38 AMMay 19
to rna-star
I would recommend using the default value (100). This parameter has typically only very marginal effect on alignments.
Reply all
Reply to author
0 new messages