Update SJ.out.tab after filtering results?

703 views
Skip to first unread message
Assigned to ado...@gmail.com by me

Lorenzo Calviello

unread,
May 15, 2014, 5:37:03 AM5/15/14
to rna-...@googlegroups.com
Hi guys,

First of all thank you for this great super-fast tool !!!

My question is the following:

In my first round of alignment I allowed up to 20 multiple mapping position for each read (I have short reads), but for subsequent analysis I only use 1 alignment per read (not only the uniquely mapping ones); I discard the secondary alignments with:

samtools view -b -F 0X100 *.bam > *_best.bam

The point is that now the SJ.out.tab file includes the Total number of multi-mapping reads per each splice site, and I would like to obtain a new SJ.out.tab that reports the amount of multi-mapping reads from the filtered alignment file.

I was wandering if it is possible to create a new SJ.out.tab from a modified sam/bam alignment file. I know it is possible to do the other way around, but apparently to get a new SJ.out.tab file I should re-align my sequences.

Is there a quick way to update/create a new a SJ.out.tab from a bam/sam file ?


I hope my question was clear!

Thank you in advance and keep up the wonderful work !


Best,

Lorenzo

Alexander Dobin

unread,
May 19, 2014, 3:47:55 PM5/19/14
to rna-...@googlegroups.com
Hi Lorenzo,

I have a simple awk scripts that extracts junctions from SAM
$ awk -f sjFromSAMcollapseUandM.awk Aligned.out.sam  | sort -V > SJ
SJ will contain columns 1,2,3,7,8 of the SJ.out.tab, i.e. intron coordinates and unique/multiple counts.

Cheers
Alex

Lorenzo Calviello

unread,
May 26, 2014, 8:03:18 AM5/26/14
to rna-...@googlegroups.com
Thank you Alex, that worked perfectly !!!
Congrats again !!!

Best,
Lorenzo

craigc...@gmail.com

unread,
Nov 7, 2014, 2:01:56 PM11/7/14
to rna-...@googlegroups.com
Hi Alex,

Along the lines in this post, I was wondering if you would consider adding the option to STAR execution to restrict the SAM/BAM output to only the primary read (for both unique and multimappers), rather than having to pipe to samtools to do this. I would be great to be able to restrict output in this way and still take advantage of your new options to output in sorted BAM format.

Also rather than use your custom script, is there an option in STAR to restrict the SJ.tab.out file based on the SAM/BAM alignments, now that you have included the option to output as sorted BAM? I know that there is an option for the reverse, to restrict the SAM/BAM based on SJ.tab.out, but I don't think I saw the reverse. Not sure if it would even be possible, but thought I'd ask. Reason being is that we have some junctions that are many (80-90) but all are from multimappers. We have been focusing on just uniquely mapped reads, so as to reduce false positives, but in doing so we are throwing out some true junctions.  We could re-capture some of those junctions that we are throwing out right now using this method in the post to output only the primary alignment from a read, then run your awk script to adjust the SJ.tab.out file based on the SAM file. 

Wonderful tool and many thanks,
Craig

Alexander Dobin

unread,
Nov 12, 2014, 10:16:07 AM11/12/14
to rna-...@googlegroups.com
Hi Craig,

I will add this option to my TODO list, I agree it will be useful in many situations. Note, that even now you can still use the BAM sorting, and then remove non-primary alignments from the sorted BAM file.
The SJ.out.tab output will be synchronized with BAM, basically only junctions from alignments written in BAM will be counted in SJ.out.tab. However, I am not sure if this is the best approach for discovering novel junctions supported by multi-mappers only, since you will be missing most of the multi-mapping alignments.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages