IncFormLen & SkipFormLen

2,634 views
Skip to first unread message

JohnT

unread,
May 21, 2015, 3:33:12 AM5/21/15
to rmats-us...@googlegroups.com
Hi All,

Sorry if this is a newbie question but I cant seem to figure out how the IncFormLen & SkipFormLen is calculated by MATS?

I also understand that these are used in some kind of a normalization when calculating the PSI (IncLevel's) but its not apparent to me if this is the case for: 1) junction counts only and/or 2) junction counts with reads on target?

Hoping someone can clarify and thank you in advance.

John

JohnT

unread,
Aug 6, 2015, 12:55:09 AM8/6/15
to rMATS User Group

Hi Again,

Following on from my initial question I'm hoping someone might still be able to help me with this so I'm adding more detail. Below is a single line of output from the SE.MATS.JunctionCountOnly file (converted to read easier in this post) for an analysis run with two replicates for each sample (complete command below):

GeneID: OSBPL3
chr: chr7
strand: -
exonStart_0base: 24902818
exonEnd: 24902911
upstreamES: 24901231
upstreamEE: 24901388
downstreamES: 24903114
downstreamEE: 24903218
IJC_SAMPLE_1: 51,52
SJC_SAMPLE_1:
107,70
IJC_SAMPLE_2: 86,129
SJC_SAMPLE_2: 15,14
IncFormLen: 172
SkipFormLen:
86
PValue: 0
FDR: 0
IncLevel1: 0.192,0.271
IncLevel2: 0.741,0.822
IncLevelDifference: -0.55

I understand why and how IncFormLen and SkipFormLen are used to normalize the isoform-specific read counts as stated in the 'SI materials and methods' document of the Shen et al 2014, PNAS publication. For example; In the above case we use these factors to normalise read counts when calculating inclusion levels like so;

ψ = (I/LI) / (I/LI + S/LS)
where: I = number of reads mapped to the exon inclusion isoform, S = number of reads mapped to the exon skipping isoform, LI = effective length of the exon inclusion isoform, LS = effective length of the exon skipping isoform.
Therefore, for IncLevel1:
(51/172) / (51/172 + 107/86) = 0.192 and (52/172) / (52/172 + 70/86) = 0.271

According to Fig. S1 from the same SI document, the authors show that for skipped exon events, LI and LS are calculated like so:
LI = 2( j - r + 1 )
LS =  j - r + 1
where: j = junction length, r = read length = 101.

Given all the above information I can easily calculate what the junction length ( j ) must be to get LI and LS in this situation. However, I cant seem to work out how this parameter
( j ) is derived from the raw data? I have been experimenting with all sorts of calculations using exonStart_0base, exonEnd, upstreamES, upstreamEE, downstreamES and downstreamEE but no luck! Can anyone calrify how junction length ( j ) is calculated?
 

I hope I have expressed this problem in a clear, understandable manner and apologise if not. Again, thanks in advance!

John  

rMATS comand line:
python /data/sacgf/tools/rMATS.3.0.9/RNASeq-MATS.py -b1 ${Control1},${Control2} -b2 ${Treatment1},${Treatment2} -gtf ${Hg19_genes_gtf} -o ${OUTDIR} -t paired -len 101 -analysis U -r1 214.3,209.0 -sd1 68.8,117.0 -r2 229.8,194.9 -sd2 76.2,61.0

xinhui...@gmail.com

unread,
Aug 18, 2015, 9:42:16 PM8/18/15
to rMATS User Group
Hi,johnT
   I got confused with the same problem, did you find out the method calculate j (junction length) ?
cheers,
Xinhui Liao

在 2015年8月6日星期四 UTC+8下午12:55:09,JohnT写道:

JohnT

unread,
Aug 20, 2015, 7:46:28 PM8/20/15
to rMATS User Group
Hi Xinhui,

Not yet, Sorry!

I'm not sure I have made my point clear enough as I thought my question would have been answered by now. If I come up with a solution I'll be sure to let you know.

Cheers

John

JohnT

unread,
Aug 20, 2015, 7:51:12 PM8/20/15
to rMATS User Group
Hi Xinhui,

Not yet, Sorry!

Cheers

John

Shen Shihao

unread,
Aug 21, 2015, 11:32:49 AM8/21/15
to rMATS User Group
The junction length j is the overall junction region covered by reads across junctions, affected by read length, anchor length (by default 8 bps in both upstream and downstream exons) and exon length.

In your example, the junction length is 186 for the skipping junction. It is calculated as (read length - anchor)*2 = (101-8)*2=186.

If in another case the exon is shorter than read length - anchor, the junction lengh will be reduced.

JohnT

unread,
Sep 8, 2015, 1:15:51 AM9/8/15
to rMATS User Group
Brilliant, Thank you.

Very much appreciated and now understood!

Hailei

unread,
Oct 1, 2015, 11:23:48 AM10/1/15
to rMATS User Group
I am still confuse on calculating the junction length j.

My results show that IncFormLen = 169 which means j = 169/2-1+100=183.5. I could not understand why j is not integer ?

Thanks,
Hailei

SHIHAO SHEN

unread,
Oct 1, 2015, 1:10:37 PM10/1/15
to Hailei, rMATS User Group
Hi Hailei,

In your case, the junction length j is calculated as j = IncFormLen(169) + read_length - 1.

Thanks,
Shihao



--
You received this message because you are subscribed to the Google Groups "rMATS User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rmats-user-gro...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rmats-user-group/c46a41f6-359e-47b5-8fa9-069e9d1dfc4a%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Hailei Zhang

unread,
Oct 1, 2015, 1:28:50 PM10/1/15
to SHIHAO SHEN, rMATS User Group
Thank you Shihao. I am confused based on your answer. 

For my understanding, we should decide junction size first, then calculate IncFormLen. From your rMATs paper (Fig S1), IncFormLen= 2*( j-readlength+1). 

Why my case use different formula? 

Thanks,

Hailei



SHIHAO SHEN

unread,
Oct 1, 2015, 1:37:52 PM10/1/15
to Hailei Zhang, rMATS User Group
Hi Hailei,

Could you please tell me more details about your dataset? Maybe I haven't fully understood your case.

Is the IncFormLen derived from using just inclusion junctions or inclusion junctions + exon reads?

What are the read length and anchor length in your analysis?

Do you have 169 IncFormLen for the majority the exons or just for a few exons? If the flanking exons are too short, some skipped exons can have trimmed IncFormLen.

Thanks,
Shihao

Hailei Zhang

unread,
Oct 1, 2015, 1:49:43 PM10/1/15
to SHIHAO SHEN, rMATS User Group
Hi Shihao,

The following result is from the output file of  MXE.MATS.JunctionCountOnly.txt.

My input read length is 100, the anchor is default value 8. 

Based on your paper: the j= 2*(100-8)=184, Then IncFormLen= 2*(j-r+1)=2*(184-100+1)=170.

But we saw some other value (169,156,143) which are different from 170.

Thanks
Hailei

IJC_SAMPLE_1  SJC_SAMPLE_1  IJC_SAMPLE_2  SJC_SAMPLE_2  IncFormLen  SkipFormLen
241,308 35,58 331,377 21,19 170 170
117,241 320,450 62,109 333,436 170 170
16,40 86,104 6,8 89,116 170 130
352,549 333,491 233,314 449,438 170 170
352,549 333,492 233,314 449,437 170 170
34,51 22,42 59,57 10,15 170 170
63,69 35,31 32,51 43,67 169 169
31,54 19,50 12,24 32,65 170 162
29,32 34,45 64,50 24,26 156 170
16,11 34,32 4,10 86,72 143 143
17,14 45,46 3,7 67,72 170 142
52,101 31,57 41,43 51,67 170 170
44,45 78,80 78,72 54,65 170 170

SHIHAO SHEN

unread,
Oct 1, 2015, 2:10:49 PM10/1/15
to Hailei Zhang, rMATS User Group
Hi Hailei,

Thanks for the details.

In the cases that some exons have shorter inclusion or skipping form length than the others, these are caused by insufficient exon length.

To archive 184 junction length for both inclusion junctions, it requires at least 92 bps for the upstream, middle and downstream exons. If either one of the exons is shorter than the required length, it will lead to a trimmed isoform length.

Please disregard my first reply. I misunderstood your case in that post.

Shihao



Hailei Zhang

unread,
Oct 1, 2015, 2:16:47 PM10/1/15
to SHIHAO SHEN, rMATS User Group
Thank you for your answer. I understand now. 

Hailei

Satyajit Rajapurkar

unread,
Mar 10, 2016, 1:04:00 PM3/10/16
to rMATS User Group, shi...@ucla.edu

Hi all,
Sorry for rehashing this topic again but I have an even more fundamental question which has confused me. Does a higher exon inclusion level value for say, Skipped Exons, mean that the proportion of skipped exons is low and vice-versa? If this is true, then for Retained Introns, does a higher exon inclusion level mean I have MORE incidence of Retained introns?

-Satyajit

Juw Won Park

unread,
Mar 22, 2016, 1:34:49 PM3/22/16
to rMATS User Group
Hello,

The inclusion level is always about the alternatively spliced region. High inclusion level in SE event means more usage of the middle exon (target exon or cassette exon) and high inclusion level in RI means more usage of the intron.

Thank you,
Juw Won

Satyajit Rajapurkar

unread,
Mar 23, 2016, 5:30:06 PM3/23/16
to rMATS User Group
Sorry I am unsure by what you mean by usage here. Do you mean to say that if, in an SE event, I have a high inclusion level, it means I have more transcripts with the "middle" exon than without? 

Juw Won Park

unread,
May 2, 2016, 10:57:01 PM5/2/16
to rMATS User Group
Yes, exactly.

Fei-man Hsu

unread,
Nov 30, 2016, 1:02:41 AM11/30/16
to rMATS User Group
Hello,

Sorry for coming back to this topic.
I just got the results from rMATS 3.2.5.
But I'm confused about the SE.MATS.JunctionCountOnly.txt file. From the above discussion I know where are the IncFormLen and SkipFormLen from.
Here is my case:
IJC_SAMPLE_1 SJC_SAMPLE_1 IJC_SAMPLE_2 SJC_SAMPLE_2 IncFormLen SkipFormLen
43,43,76 1,0,2 19,18 24,25 138 69
65,57,130 0,0,1 30,41 12,17 138 69
122,56,37 598,241,200 1,0 173,248 138 69
258,112,135 59,18,26 200,254 0,1 138 69
512,657,356 1,2,1 289,426 172,208 125 69
1,211,606,753 127,54,104 513,611 189,237 125 69
227,153,128 231,194,145 522,830 136,130 128 69
470,253,356 301,212,327 49,75 240,336 138 69
106,89,83 8,7,10 62,94 60,83 138 69
514,431,783 18,9,31 363,708 59,109 125 69

I used trimFastq.py to trim my reads into 70bp. The anchor length is as the default 8.
So according to the paper, the junction length should be j = (70 - 8) *2 = 124; and the IncFormLen should be 2*(124-70+1) = 110
Why I got values over 110?

Thanks
Fei-Man

Juw Won Park於 2016年5月3日星期二 UTC+9上午11時57分01秒寫道:

Juw Won Park

unread,
Nov 30, 2016, 9:26:59 AM11/30/16
to rMATS User Group
Hello,

The anchor length is used only for mapping. After the mapping, rMATS uses all the reads that span the junctions (i.e., no anchor length requirement). Since your read length is 70bp, the inclusion form length = 2*(70-1) = 138. If the cassette exon is smaller than the read length, 70bp in your case, the incFormLen could be smaller such as 125 in your example. 

-Juw Won

YI XING

unread,
Dec 1, 2016, 2:35:51 AM12/1/16
to Juw Won Park, rMATS User Group

Hi Fei-man,

I’d like to add that this is due to a recent update to rMATS 3.2.X in terms of how we count reads and define the effective lengths of different splice forms. This is why the effective length is calculated differently now from the formula provided in the original rMATS paper.

 

Yi

Fei-man Hsu

unread,
Dec 1, 2016, 2:55:51 AM12/1/16
to rMATS User Group, rmat...@gmail.com
Thanks Juw Won and Yi for the reply.


Yi Xing於 2016年12月1日星期四 UTC+9下午4時35分51秒寫道:

地藏菩薩戴斗笠的

unread,
Nov 6, 2021, 5:59:54 AM11/6/21
to rMATS User Group
Hello all,

Sorry for rehashing this topic. I'm new to using rMATS for splicing analysis. I understand the idea of effective length and PSI calculation. However, at the mapping step, I don't understand why the anchor length has been reduced from 8bp (as used in (Shen et al, 2014) to 1bp (as described on GitHub rMATS-turbo) by default? Why would people not worry that this might be too short of an anchor to ensure correct mapping of splicing junction?

Many thanks for your time and kind help in advance!

Best,
Juli Wang

kutsc...@gmail.com

unread,
Nov 11, 2021, 11:44:03 AM11/11/21
to rMATS User Group
There are actually 2 anchor length parameters to rMATS-turbo. https://github.com/Xinglab/rmats-turbo/tree/v4.1.1#all-arguments

--anchorLength ANCHORLENGTH
                      The anchor length. Default is 1
--tophatAnchor TOPHATANCHOR
                      The "anchor length" or "overhang length" used in the
                      aligner. At least "anchor length" NT must be mapped to
                      each end of a given junction. The default is 6. (Only
                      if using fastq)

--tophatAnchor (default 6) is passed to STAR as --alignSJDBoverhangMin (used for annotated splice junctions) so the anchor used for mapping is not much shorter than the previous value of 8 (although I'm not sure why or at what time the value changed)

--anchorLength (default 1) is used by rmats to validate junction reads when it is parsing the bam files which are either supplied directly with --b1 --b2 or are the result of the mapping step if rMATS is started from fastq files. The default of 1 basically accepts any junction output by the aligner

--anchorLength is also used as part of the effective length calculation. There are actually multiple anchor lengths used by the STAR aligner so it is not obvious what anchor length to use for the effective length calculation. The other STAR anchor length is the required anchor length for unannotated splice junctions (--alignSJoverhangMin which defaults to 5)

Similar to this post (https://groups.google.com/g/rmats-user-group/c/ZCxjlQfP9ak/m/PaO_skpQAgAJ), we tested small changes to the anchor length and found that it only has a very small impact on the final results

There does seem to be some room for improvement in the way that rMATS handles anchor lengths

Eric
Reply all
Reply to author
Forward
0 new messages