IncFormLen & SkipFormLen

JohnT

unread,

May 21, 2015, 3:33:12 AM5/21/15

to rmats-us...@googlegroups.com

Hi All,

Sorry if this is a newbie question but I cant seem to figure out how the IncFormLen & SkipFormLen is calculated by MATS?

I also understand that these are used in some kind of a normalization when calculating the PSI (IncLevel's) but its not apparent to me if this is the case for: 1) junction counts only and/or 2) junction counts with reads on target?

Hoping someone can clarify and thank you in advance.

John

JohnT

unread,

Aug 6, 2015, 12:55:09 AM8/6/15

to rMATS User Group

Hi Again,

Following on from my initial question I'm hoping someone might still be able to help me with this so I'm adding more detail. Below is a single line of output from the SE.MATS.JunctionCountOnly file (converted to read easier in this post) for an analysis run with two replicates for each sample (complete command below):

GeneID: OSBPL3
chr: chr7
strand: -
exonStart_0base: 24902818
exonEnd: 24902911
upstreamES: 24901231
upstreamEE: 24901388
downstreamES: 24903114
downstreamEE: 24903218
IJC_SAMPLE_1: 51,52
SJC_SAMPLE_1: 107,70
IJC_SAMPLE_2: 86,129
SJC_SAMPLE_2: 15,14
IncFormLen: 172
SkipFormLen: 86
PValue: 0
FDR: 0
IncLevel1: 0.192,0.271
IncLevel2: 0.741,0.822
IncLevelDifference: -0.55

I understand why and how IncFormLen and SkipFormLen are used to normalize the isoform-specific read counts as stated in the 'SI materials and methods' document of the Shen et al 2014, PNAS publication. For example; In the above case we use these factors to normalise read counts when calculating inclusion levels like so;

ψ = (I/LI) / (I/LI + S/LS)
where: I = number of reads mapped to the exon inclusion isoform, S = number of reads mapped to the exon skipping isoform, LI = effective length of the exon inclusion isoform, LS = effective length of the exon skipping isoform.
Therefore, for IncLevel1:
(51/172) / (51/172 + 107/86) = 0.192 and (52/172) / (52/172 + 70/86) = 0.271

According to Fig. S1 from the same SI document, the authors show that for skipped exon events, LI and LS are calculated like so:
LI = 2( j - r + 1 )
LS = j - r + 1
where: j = junction length, r = read length = 101.

Given all the above information I can easily calculate what the junction length ( j ) must be to get LI and LS in this situation. However, I cant seem to work out how this parameter ( j ) is derived from the raw data? I have been experimenting with all sorts of calculations using exonStart_0base, exonEnd, upstreamES, upstreamEE, downstreamES and downstreamEE but no luck! Can anyone calrify how junction length ( j ) is calculated?

I hope I have expressed this problem in a clear, understandable manner and apologise if not. Again, thanks in advance!

John

rMATS comand line:

python /data/sacgf/tools/rMATS.3.0.9/RNASeq-MATS.py -b1 ${Control1},${Control2} -b2 ${Treatment1},${Treatment2} -gtf ${Hg19_genes_gtf} -o ${OUTDIR} -t paired -len 101 -analysis U -r1 214.3,209.0 -sd1 68.8,117.0 -r2 229.8,194.9 -sd2 76.2,61.0

xinhui...@gmail.com

unread,

Aug 18, 2015, 9:42:16 PM8/18/15

to rMATS User Group

Hi,johnT
I got confused with the same problem, did you find out the method calculate j (junction length) ?

cheers,

Xinhui Liao

在 2015年8月6日星期四 UTC+8下午12:55:09，JohnT写道：

JohnT

unread,

Aug 20, 2015, 7:46:28 PM8/20/15

to rMATS User Group

Hi Xinhui,

Not yet, Sorry!

I'm not sure I have made my point clear enough as I thought my question would have been answered by now. If I come up with a solution I'll be sure to let you know.

Cheers

John

JohnT

unread,

Aug 20, 2015, 7:51:12 PM8/20/15

to rMATS User Group

Hi Xinhui,

Not yet, Sorry!

Cheers

John

Shen Shihao

unread,

Aug 21, 2015, 11:32:49 AM8/21/15

to rMATS User Group

The junction length j is the overall junction region covered by reads across junctions, affected by read length, anchor length (by default 8 bps in both upstream and downstream exons) and exon length.

In your example, the junction length is 186 for the skipping junction. It is calculated as (read length - anchor)*2 = (101-8)*2=186.

If in another case the exon is shorter than read length - anchor, the junction lengh will be reduced.

JohnT

unread,

Sep 8, 2015, 1:15:51 AM9/8/15

to rMATS User Group

Brilliant, Thank you.

Very much appreciated and now understood!

Hailei

unread,

Oct 1, 2015, 11:23:48 AM10/1/15

to rMATS User Group

I am still confuse on calculating the junction length j.

My results show that IncFormLen = 169 which means j = 169/2-1+100=183.5. I could not understand why j is not integer ?

Thanks,

Hailei

SHIHAO SHEN

unread,

Oct 1, 2015, 1:10:37 PM10/1/15

to Hailei, rMATS User Group

Hi Hailei,

In your case, the junction length j is calculated as j = IncFormLen(169) + read_length - 1.

Thanks,

Shihao

--
You received this message because you are subscribed to the Google Groups "rMATS User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rmats-user-gro...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rmats-user-group/c46a41f6-359e-47b5-8fa9-069e9d1dfc4a%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Hailei Zhang

unread,

Oct 1, 2015, 1:28:50 PM10/1/15

to SHIHAO SHEN, rMATS User Group

Thank you Shihao. I am confused based on your answer.

For my understanding, we should decide junction size first, then calculate IncFormLen. From your rMATs paper (Fig S1), IncFormLen= 2*( j-readlength+1).

Why my case use different formula?

Thanks,

Hailei

SHIHAO SHEN

unread,

Oct 1, 2015, 1:37:52 PM10/1/15

to Hailei Zhang, rMATS User Group

Hi Hailei,

Could you please tell me more details about your dataset? Maybe I haven't fully understood your case.

Is the IncFormLen derived from using just inclusion junctions or inclusion junctions + exon reads?

What are the read length and anchor length in your analysis?

Do you have 169 IncFormLen for the majority the exons or just for a few exons? If the flanking exons are too short, some skipped exons can have trimmed IncFormLen.

Thanks,

Shihao

Hailei Zhang

unread,

Oct 1, 2015, 1:49:43 PM10/1/15

to SHIHAO SHEN, rMATS User Group

Hi Shihao,

The following result is from the output file of MXE.MATS.JunctionCountOnly.txt.

My input read length is 100, the anchor is default value 8.

Based on your paper: the j= 2*(100-8)=184, Then IncFormLen= 2*(j-r+1)=2*(184-100+1)=170.

But we saw some other value (169,156,143) which are different from 170.

Thanks

Hailei

IJC_SAMPLE_1	SJC_SAMPLE_1	IJC_SAMPLE_2	SJC_SAMPLE_2	IncFormLen	SkipFormLen
241,308	35,58	331,377	21,19	170	170
117,241	320,450	62,109	333,436	170	170
16,40	86,104	6,8	89,116	170	130
352,549	333,491	233,314	449,438	170	170
352,549	333,492	233,314	449,437	170	170
34,51	22,42	59,57	10,15	170	170
63,69	35,31	32,51	43,67	169	169
31,54	19,50	12,24	32,65	170	162
29,32	34,45	64,50	24,26	156	170
16,11	34,32	4,10	86,72	143	143
17,14	45,46	3,7	67,72	170	142
52,101	31,57	41,43	51,67	170	170
44,45	78,80	78,72	54,65	170	170

SHIHAO SHEN

unread,

Oct 1, 2015, 2:10:49 PM10/1/15

to Hailei Zhang, rMATS User Group

Hi Hailei,

Thanks for the details.

In the cases that some exons have shorter inclusion or skipping form length than the others, these are caused by insufficient exon length.

To archive 184 junction length for both inclusion junctions, it requires at least 92 bps for the upstream, middle and downstream exons. If either one of the exons is shorter than the required length, it will lead to a trimmed isoform length.

Please disregard my first reply. I misunderstood your case in that post.

Shihao

Hailei Zhang

unread,

Oct 1, 2015, 2:16:47 PM10/1/15

to SHIHAO SHEN, rMATS User Group

Thank you for your answer. I understand now.

Hailei

Satyajit Rajapurkar

unread,

Mar 10, 2016, 1:04:00 PM3/10/16

to rMATS User Group, shi...@ucla.edu

Hi all,

Sorry for rehashing this topic again but I have an even more fundamental question which has confused me. Does a higher exon inclusion level value for say, Skipped Exons, mean that the proportion of skipped exons is low and vice-versa? If this is true, then for Retained Introns, does a higher exon inclusion level mean I have MORE incidence of Retained introns?

-Satyajit

Juw Won Park

unread,

Mar 22, 2016, 1:34:49 PM3/22/16

to rMATS User Group

Hello,

The inclusion level is always about the alternatively spliced region. High inclusion level in SE event means more usage of the middle exon (target exon or cassette exon) and high inclusion level in RI means more usage of the intron.

Thank you,

Juw Won

Satyajit Rajapurkar

unread,

Mar 23, 2016, 5:30:06 PM3/23/16

to rMATS User Group

Sorry I am unsure by what you mean by usage here. Do you mean to say that if, in an SE event, I have a high inclusion level, it means I have more transcripts with the "middle" exon than without?

Juw Won Park

unread,

May 2, 2016, 10:57:01 PM5/2/16

to rMATS User Group

Yes, exactly.

Fei-man Hsu

unread,

Nov 30, 2016, 1:02:41 AM11/30/16

to rMATS User Group

Hello,

Sorry for coming back to this topic.

I just got the results from rMATS 3.2.5.

But I'm confused about the SE.MATS.JunctionCountOnly.txt file. From the above discussion I know where are the IncFormLen and SkipFormLen from.

Here is my case:

IJC_SAMPLE_1

SJC_SAMPLE_1

IJC_SAMPLE_2

SJC_SAMPLE_2

IncFormLen

SkipFormLen

43,43,76	1,0,2	19,18	24,25	138	69
65,57,130	0,0,1	30,41	12,17	138	69
122,56,37	598,241,200	1,0	173,248	138	69
258,112,135	59,18,26	200,254	0,1	138	69
512,657,356	1,2,1	289,426	172,208	125	69
1,211,606,753	127,54,104	513,611	189,237	125	69
227,153,128	231,194,145	522,830	136,130	128	69
470,253,356	301,212,327	49,75	240,336	138	69
106,89,83	8,7,10	62,94	60,83	138	69
514,431,783	18,9,31	363,708	59,109	125	69

I used trimFastq.py to trim my reads into 70bp. The anchor length is as the default 8.

So according to the paper, the junction length should be j = (70 - 8) *2 = 124; and the IncFormLen should be 2*(124-70+1) = 110

Why I got values over 110?

Thanks

Fei-Man

Juw Won Park於 2016年5月3日星期二 UTC+9上午11時57分01秒寫道：

Juw Won Park

unread,

Nov 30, 2016, 9:26:59 AM11/30/16

to rMATS User Group

Hello,

The anchor length is used only for mapping. After the mapping, rMATS uses all the reads that span the junctions (i.e., no anchor length requirement). Since your read length is 70bp, the inclusion form length = 2*(70-1) = 138. If the cassette exon is smaller than the read length, 70bp in your case, the incFormLen could be smaller such as 125 in your example.

-Juw Won

YI XING

unread,

Dec 1, 2016, 2:35:51 AM12/1/16

to Juw Won Park, rMATS User Group

Hi Fei-man,

I’d like to add that this is due to a recent update to rMATS 3.2.X in terms of how we count reads and define the effective lengths of different splice forms. This is why the effective length is calculated differently now from the formula provided in the original rMATS paper.

Yi

To view this discussion on the web visit https://groups.google.com/d/msgid/rmats-user-group/e49e36b3-e944-4445-a939-96dd851a572a%40googlegroups.com.

Fei-man Hsu

unread,

Dec 1, 2016, 2:55:51 AM12/1/16

to rMATS User Group, rmat...@gmail.com

Thanks Juw Won and Yi for the reply.

Yi Xing於 2016年12月1日星期四 UTC+9下午4時35分51秒寫道：

地藏菩薩戴斗笠的

unread,

Nov 6, 2021, 5:59:54 AM11/6/21

to rMATS User Group

Hello all,

Sorry for rehashing this topic. I'm new to using rMATS for splicing analysis. I understand the idea of effective length and PSI calculation. However, at the mapping step, I don't understand why the anchor length has been reduced from 8bp (as used in (Shen et al, 2014) to 1bp (as described on GitHub rMATS-turbo) by default? Why would people not worry that this might be too short of an anchor to ensure correct mapping of splicing junction?

Many thanks for your time and kind help in advance!

Best,

Juli Wang

kutsc...@gmail.com

unread,

Nov 11, 2021, 11:44:03 AM11/11/21

to rMATS User Group

There are actually 2 anchor length parameters to rMATS-turbo. https://github.com/Xinglab/rmats-turbo/tree/v4.1.1#all-arguments

--anchorLength ANCHORLENGTH
                      The anchor length. Default is 1
--tophatAnchor TOPHATANCHOR
                      The "anchor length" or "overhang length" used in the
                      aligner. At least "anchor length" NT must be mapped to
                      each end of a given junction. The default is 6. (Only
                      if using fastq)

--tophatAnchor (default 6) is passed to STAR as --alignSJDBoverhangMin (used for annotated splice junctions) so the anchor used for mapping is not much shorter than the previous value of 8 (although I'm not sure why or at what time the value changed)

--anchorLength (default 1) is used by rmats to validate junction reads when it is parsing the bam files which are either supplied directly with --b1 --b2 or are the result of the mapping step if rMATS is started from fastq files. The default of 1 basically accepts any junction output by the aligner

--anchorLength is also used as part of the effective length calculation. There are actually multiple anchor lengths used by the STAR aligner so it is not obvious what anchor length to use for the effective length calculation. The other STAR anchor length is the required anchor length for unannotated splice junctions (--alignSJoverhangMin which defaults to 5)

Similar to this post (https://groups.google.com/g/rmats-user-group/c/ZCxjlQfP9ak/m/PaO_skpQAgAJ), we tested small changes to the anchor length and found that it only has a very small impact on the final results

There does seem to be some room for improvement in the way that rMATS handles anchor lengths

Eric

Reply all

Reply to author

Forward