Confusion related to Output value in rMATS

laha.sayan...@gmail.com

unread,

Jun 16, 2021, 9:59:22 AM6/16/21

to rMATS User Group

Hi,

I have just run rmats with paired model. I am looking into the outputs for A3SS splicing event. I have sorted the FDR in descending to get the most significant events first.

However, my second entry has Inclevel1 and IncLevel2 values as (12 replicates in each group):

IncLevel1: 1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0

IncLevel2:

1.0,1.0,1.0,0.999,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.999,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0

The computed inclusion level difference is 0 in this case.

This event is showing FDR = 4.83518780569625E-12. is this right?

P.S. I have looked into the Junction Counts and they have values like (only 5 are shown for brevity)

IJC_SAMPLE_1: 151, 129, 260, 98, 129

SJC_SAMPLE_1: 0,0,0,0,0

IJC_SAMPLE_2: 7100, 23500, 7576, 8069, 11191

SJC_SAMPLE_2: 2, 9, 0, 10, 4

So from this we can see that the inclusion isoform is present in both the conditions, but in condition1 it is more dominant compared to condition2. The skipping isoform is not epressed / meagerly expressed in both conditions. However, since this program determines alternative splicing between 2 conditions, judging from the IncLeveldifference value, how is it significant at all?

I was thinking of selecting candidates from the outputs based on FDR and IncLevelDifference. For instance if I impose a threshold of FDR <= 0.05 and |IncLevelDiff| >= 0.5 (say), will it be a appropriate? In gene expression studies, normally a fold change cutoff of 2 folds is used as standard. I believe fold change and IncLevelDiff are analogous, so a suitable cutt-off should help me select the best results?

Thanks,

Sayantan

Sayantan Laha

unread,

Jun 16, 2021, 11:52:10 AM6/16/21

to rMATS User Group

I have run the samples without the paired model, keeping everything else identical. In the unpaired output, I am getting FDR = 1 for the above case.

Moreover, when I am sorting by FDR, the paired and unpaired outputs are completely different. I am confused which version to treat as the correct one.

Please help.

Thanks,

Sayantan

--
You received this message because you are subscribed to the Google Groups "rMATS User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rmats-user-gro...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rmats-user-group/bc45c501-ba0d-40a5-ac9f-2df2334a2000n%40googlegroups.com.

Thomas Danhorn

unread,

Jun 16, 2021, 11:53:31 AM6/16/21

to laha.sayan...@gmail.com, rMATS User Group

In your example it is strange that would get a very low FDR with almost no
IncLevelDiff; I suspect that the read numbers are very high, so that
minimal differences appear statistically significant, even though the
biological significance is questionable. Maybe Eric has more insights
into that.

The IncLevelDiff is certainly connected to the biological significance, so
it makes sense to apply a cutoff there as well, but your proposal of 0.5
seems very high and with most samples you will get no or very few events.
In my experience, splicing events are often more subtle, and remember that
IncLevelDiff is a difference between fractions/percentages, not a
foldchange as you use in differential expression.
Example 1:
IncLevel1 = 0.1, IncLevel2 = 0.2, IncLevelDiff = 0.1,
foldchange = 0.2/0.1 = 2x.

Example 2:
IncLevel1 = 0.5, IncLevel2 = 1.0, IncLevelDiff = 0.5,
foldchange = 1.0/0.5 = 2x.

With gene expression, you have two values per gene -- expression
(counts/CPM/TPM) in group 1, and expression in group 2. With splicing
events you have four values: Levels isoform A and levels of isoform B in
group 1, and levels of both isoforms in group 2. The fraction of isoform
A in group 1 is IncLevel1, the fraction of isoform B is (1 - IncLevel1).
So the situtation is a bit more complex than for simple gene expression.

Obviously, if you can find events with IncLevelDiff of >=0.5, these would
be very strong and likely worth looking at more closely. What an event
means in the context of the biology depends not only on the effect size
but also on the biological function of the isoforms -- what does a shift
from one to the other do in the cell?

Hope this helps,

Thomas

On Wed, 16 Jun 2021, laha.sayan...@gmail.com wrote:

>
> Hi,
> I have just run rmats with paired model. I am looking into the outputs
> for A3SS splicing event. I have sorted the FDR in descending to get the
> most significant events first.
>
> However, my second entry has Inclevel1 and IncLevel2 values as (12
> replicates in each group):
>

> *IncLevel1:
> 1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0*
>
> *IncLevel2:*
> *1.0,1.0,1.0,0.999,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.999,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0*

>
> The computed inclusion level difference is 0 in this case.
>

> This event is showing *FDR = 4.83518780569625E-12*. is this right?

>
> P.S. I have looked into the Junction Counts and they have values like (only
> 5 are shown for brevity)
>

> *IJC_SAMPLE_1: 151, 129, 260, 98, 129*
> *SJC_SAMPLE_1: 0,0,0,0,0*
>
> *IJC_SAMPLE_2: 7100, 23500, 7576, 8069, 11191*
> *SJC_SAMPLE_2: 2, 9, 0, 10, 4*

>
> So from this we can see that the inclusion isoform is present in both the
> conditions, but in condition1 it is more dominant compared to condition2.
> The skipping isoform is not epressed / meagerly expressed in both
> conditions. However, since this program determines alternative splicing
> between 2 conditions, judging from the IncLeveldifference value, how is it
> significant at all?
>
> I was thinking of selecting candidates from the outputs based on FDR and
> IncLevelDifference. For instance if I impose a threshold of FDR <= 0.05 and
> |IncLevelDiff| >= 0.5 (say), will it be a appropriate? In gene expression
> studies, normally a fold change cutoff of 2 folds is used as standard. I
> believe fold change and IncLevelDiff are analogous, so a suitable cutt-off
> should help me select the best results?
>
> Thanks,
> Sayantan
>

kutsc...@gmail.com

unread,

Jun 16, 2021, 12:57:13 PM6/16/21

to rMATS User Group

I think the difference in FDR between the paired and unpaired model in that example is due to the --cstat parameter. Here's the description of --cstat from the help message:

"The cutoff splicing difference. The cutoff used in the null hypothesis test for differential splicing. The default is 0.0001 for 0.01% difference. Valid: 0 <= cutoff < 1. Does not apply to the paired stats model"

--cstat only applies to the unpaired model. The FDR of 1 that you get from the unpaired model makes sense, because that FDR is essentially the probability that the observed counts were produced from samples where the true IncLevelDifference between the two samples is at most 0.0001 (or the --cstat value you provided). Based on the IncLevels and counts that you posted, it looks like the IncLevelDifference calculated from the counts would be about 0.0001 (but was rounded to 3 decimal places and became 0.0)

The FDR of 4e-12 from the paired model also makes sense because the paired model does not use --cstat. Instead it is calculating the probability that the observed counts were produced from samples where the true IncLevelDifference between the two samples is exactly 0. Since you have 12 replicates in each sample and 100s or 1000s of counts per replicate, the paired model is very confident (FDR=4e-12) that the true IncLevelDifference is not 0

If you filter to events with absolute value of IncLevelDifference > 0.01, then I would expect the FDR values to be more similar (but still not exactly the same) between the paired and unpaired model since that filtering should mostly hide the effect of --cstat. You may want to choose a different cutoff to represent biological significance

Eric

X L

unread,

Oct 22, 2024, 7:32:27 AM10/22/24

to rMATS User Group

I appreciate Thomas's explanation. However, I believe it would be more informative to include the fold change information in the rMATS output. For example, in the scenario Thomas provided (Example 1: IncLevel1 = 0.1, IncLevel2 = 0.2, IncLevelDiff = 0.1, fold change = 0.2/0.1 = 2x). It would also be helpful if rMATS could output the average fold change (IncLevel1/IncLevel2) as a separate column, allowing users to filter results based on dPsi, fold change, or both.

Thanks,

Xiao

On Wednesday, June 16, 2021 at 11:53:31 AM UTC-4 tdan...@gmail.com wrote:

Reply all

Reply to author

Forward