Samples with variable read lengths and genome index.

Joshua Mincer

unread,

May 17, 2023, 3:42:32 PM5/17/23

to rna-star

Hello,

I'm trying to put together a pipeline to preprocess fastq reads and align to the genome using STAR. For this, I need to generate the genome index.

The samples that I have are variable read lengths after preprocessing, ranging from 50bp in some samples and 150bp in other samples.

The manual states that you should use a STAR genome for max(readLength)-1. Should I then be generating a genome for each of these samples? Or will using one genome with sjdbOverhang of 149 for all samples be sufficient? Thanks for any help!

Joshua Mincer

unread,

May 17, 2023, 6:06:33 PM5/17/23

to rna-star

In addition, if I do decide to use a STAR genome for each of these, would it be safe to use an sjdbOverhang of max(readLength) - 1 BEFORE preprocessing? For example, if all my reads start with 151bp, but through end trimming, the max size across samples ranges from 140, 142, 144, 145, and 150, is using the same index using an overhang of 151-1 still usable for all of the samples?

--
You received this message because you are subscribed to the Google Groups "rna-star" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rna-star+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rna-star/2650982c-7ae5-48c2-8c03-963e5864f36en%40googlegroups.com.

Alexander Dobin

unread,

May 19, 2023, 7:31:38 AM5/19/23

to rna-star

Hi,

I would recommend using the default value (100). This parameter has typically only very marginal effect on alignments.

Reply all

Reply to author

Forward