--outSJfilterDistToOtherSJmin

50 views
Skip to first unread message

Mario Keller

unread,
Nov 8, 2023, 6:34:58 AM11/8/23
to rna-star
Hi Alex,

I have a brief question regarding the  --outSJfilterDistToOtherSJmin parameter.

The default values are 10 0 5 10 for non-canonical, GT/AG, GC/AG and AT/AC motifs, respectively.

In IGV I observed a novel plice junction that skips an exon. However, I do not see the junction in SJ.out.tab

The event looks like this.
[  Exon1  ]GC_____AG[  Exon2  ]GT_____AG[  Exon3  ]

The junctions connecting Exon1+Exon2 and Exon2+Exon3 are annotated, while the junction connecting Exon1+Exon3 is novel.

I was wondering whether --outSJfilterDistToOtherSJmin is the reason I do not see the skipping junction in the output. As the skipping junction has the GC/AG motif a value of 5 applies. As far as I understand the parameter the donor and acceptor site of the novel junction need to have a minimum distance of 5 nt to the donor/acceptor sites of any other junction. Could it be that the junction is not reported in SJ.out.tab as it shares the donor/acceptor sites of the junctions that include Exon2, so the distance is 0, which is < 5?

When setting all four values of --outSJfilterDistToOtherSJmin to 0 the skipping junction is reported in the output.

Thanks in advance.

Alexander Dobin

unread,
Nov 17, 2023, 3:27:12 PM11/17/23
to rna-star
Hi Mario,

Your interpretation is correct!
The SJ.out.tab is filtered stringently by default against non GT/AG junctions.

Mario Keller

unread,
Nov 17, 2023, 4:42:05 PM11/17/23
to rna-star

Hi Alex,

thanks for the confirmation. While I understand the intuition of the default parameters, don't you think that non GT/AG exon skipping junctions whose splice sites are annotated as part of two other junctions can be considered as trustworthy? I think I will go for 10 0 0 0 0 and try to be more stringent in the postprocessing.

Alexander Dobin

unread,
Dec 6, 2023, 3:56:54 PM12/6/23
to rna-star
Hi Mario,

Yes, absolutely - reducing stringency at mapping and more careful post-processing is the way to go.
The default parameters are there as a guide and are based on some anecdotal evidence I used when I designed the algorithm.

Reply all
Reply to author
Forward
0 new messages