Do I understand the sjdbOverhang parameter correct?

952 views
Skip to first unread message

MMTerpstra

unread,
May 22, 2014, 11:02:13 AM5/22/14
to rna-...@googlegroups.com

Can i make the following statement about the sjdbOverhang parameter (based on the fact that in the index the splice junctions are used as additional reference sequence for better mapping accuracy and thus count for the amount of valid mappings found and thus influence the mapq )?

for sjdbOverhang the best thing to use is max read length -1:

shorter than max read length result in sloppy alignments near intron-exon borders but possibly better mapq (less read alignment positions).
max read length results in good alignments near intron-exon borders but possibly worse mapq (more read alignment positions) for shorter reads.
greater then max read length results into good alignments near intron-exon borders but worse mapq (more read alignment positions) for all reads.

Although this also depends on the read distribution but lets say this is illumina data (with 70%-90% max read len).

sorry I'm to lazy to review the code/try i out myselfs :)


Alexander Dobin

unread,
May 23, 2014, 9:42:10 AM5/23/14
to rna-...@googlegroups.com
Hi,

your statement is close to the truth. 
sjdbOverhang is the number of bases taken from both sides of the junction and joined together to form an additional "junction" sequence for seed search.
sjdbOverhang =(readLength-1) is ideal to capture a read that has (readLength-1) bases on one side and 1 base on the other. However, since the reads will be sampled by --seedSearchStartLmax bases, even if sjdbOverhang < (readLength-1), the junctions with 1-base overhang will still be found.
On the other hand, if sjdbOverhang is too long, more seeds will become multi-mappers, since the the junction sequences are redundant with the genome. However, STAR transforms the seed coordinates from junction coordinates to genome coordinates, and equivalent seeds are collapsed - so in the end it affects only a marginal population of reads.
More Some about it in this post.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages