The SJ.out.tab information from STAR -align

5,071 views
Skip to first unread message

alva james

unread,
Jun 7, 2016, 12:49:21 PM6/7/16
to rna-star
Hello Alex,
Thanks for sharing STAR-align..

I would like to know whether the information in the *.SJ.OUT.tab file from STAR-align is used for the final BAM output.
I mean whether the lice junction are used in the output BAM file.

Also the last column in the *SJ.out.tab file says its 'maximum spliced alignment overhang' and when I read about it , I understood its read length -1 and I just wanted to know what does it mean . As in is it the length of the read which is mapped to that position..?

Thank you for the reply.
/A

Alexander Dobin

unread,
Jun 8, 2016, 5:27:54 PM6/8/16
to rna-star
Hi Alva,

the SJ.out.tab contains filtered splice junctions detected in the mapping - i.e. if you extract junctions from the BAM and filter them according to --outSJfilter* filters, you would get the SJ.out.tab file.

The overhang in this case is the alignment overhang, i.e. if a read  is spliced as
ACGTACGT----------ACGT
the overhang is 4, and than for all the reads crossing this junction the maximum overhang is reported in the SJ.out.tab.

Cheers
Alex

alva james

unread,
Jul 4, 2016, 7:20:44 AM7/4/16
to rna-star
Thanks Alex

Samuel Rivero

unread,
Mar 16, 2017, 3:14:35 PM3/16/17
to rna-star
Hi Alex,

Is it possible to get both junction overhangs (the maximum) from any of the output files?

In you example: ACGTACGT----------ACGT , it would be 8 and 4.

Thanks

Sam

Alexander Dobin

unread,
Mar 16, 2017, 3:20:24 PM3/16/17
to rna-star
Hi Samuel,

you would need to extract the overhangs from the CIGARS in the SAM/BAM file.

Cheers
Alex

Drwhit

unread,
Apr 4, 2018, 5:44:33 PM4/4/18
to rna-star
Alex, 

Thank you for STAR. 

Can you say why maximum overhang would be a more useful statistic than minimum overhang?  It seems that minimum overhang would be more important in determining uncertainty.  Also, are multimapping reads considered when calculating maximum overhang?

Thanks,
Adam

Alexander Dobin

unread,
Apr 7, 2018, 4:14:59 PM4/7/18
to rna-star
Hi Adam,

overhang for each spliced read is calculated as a minimum of the donor and acceptor segment lengths.
Then, for all reads spliced over the same junctions, the maximum overhang is reported, because we want to know the most confidently spliced read. The minimum overhang would just tell us that there is a read that is not spliced very confidently.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages