Splice junctions - number of splices

347 views
Skip to first unread message

Benjy Jek Yang Tan

unread,
Dec 22, 2016, 11:39:40 AM12/22/16
to rna-star
Hi, 

I am a new user to STAR and I find it to be a pretty good software. I am using TopHat previously. 

Just one question here - from the SJ.out.tab file, can someone explain to me the meanings of column 7 & 8? I mean, if I want to know the number of each splice, which column should I use? 

What I usually do when using TopHat is to view the .sam file in IGV and export the junctions.bed file where I can get the depth of read for each splice. I did the same using the .sam file generated through STAR but the depth of reads I got was different from the numbers in the SJ.out.tab file.

Please advice. Thank you very much!

Alexander Dobin

unread,
Dec 22, 2016, 1:03:46 PM12/22/16
to rna-star
Hi Benjy,

SJ.out.tab contains only the junctions that passed filtering with --outSJfilter* parameters.
You will get more junctions if you extract junctions them directly from the SAM file.

Column 7 is the number of uniquely mapped reads per junction, while the column 8 is the number of multi-mappers.

Note, that for paired end reads, if both mates overlap the same junction, the read will only be counted once.

If you take all of the above into account, you should get the same counts from the SAM file.
I have a simple awk script to extract these counts from the SAM file in the STAR source extras/scripts/sjFromSAMcollapseUandM.awk.

Cheers
Alex
Message has been deleted

Benjy Jek Yang Tan

unread,
Dec 28, 2016, 3:25:36 AM12/28/16
to rna-star
Thank you for explanation. I understand now.

I tried to run the script with the awk command and as a shell script but was unable to run it. Could you please guide me on how to use the script properly? Thanks!

Alexander Dobin

unread,
Jan 6, 2017, 2:38:26 PM1/6/17
to rna-star
Hi Benjy,

for bam files:
$ samtools view Aligned.out.bam |awk -f sjFromSAMcollapseUandM.awk > sj.tab
Columns 4 and 5 in the output file are the unique and multiple read counts, equal to columns 7 and 8 in the STAR's SJ.out.tab file.
There is also sjFromSAMcollapseUandM_inclOverlaps.awk script that counts the overlapping mates separately.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages