How to filter significant events

Alix

unread,

May 5, 2022, 11:15:16 AM5/5/22

to rMATS User Group

Hello,

I have just started using rMATS and I am a bit confused about how to set the parameters and then filter for significant events on the output.

To my understanding, if I wanted to identify AS events with deltaPSI > 15% in my groups, I should run rMATS with --cstat 0.15, and keep all events with significant FDR.

However, I've seen many people run rMATS with default cstat value (0.0001) and keep events with significant FDR and IncLevelDifference > 0.15. I don't understand the logic behind this approach as I thought the p-value and FDR were dependant on the threshold that you want to test against.

I would love any insight/advice on how to run rMATS, and which approach is better.

Thanks in advance,

Best,

Alix

kutsc...@gmail.com

unread,

May 6, 2022, 9:35:44 AM5/6/22

to rMATS User Group

Using --cstat 0.15 lets the statistical model enforce the desired 15% cutoff and in that case the FDR quantifies the confidence that the event meets the cutoff. Using --cstat 0.0001 and a separate 15% filter on IncLevelDifference is still reasonable, but now the FDR value is based on a small cutoff that is easier to meet than 15% and the IncLevelDifference of 15% is not checked with a statistical test. It's not clear that one of those approaches is better than the other. If you want more events and you are ok with using the two separate filters, then cstat 0.0001 makes sense. If you want to be more strict about the 15% cutoff, then cstat 0.15 makes sense

Eric

Thomas Danhorn

unread,

May 6, 2022, 12:12:07 PM5/6/22

to kutsc...@gmail.com, rMATS User Group

Alix, you can think about it this way -- If there is a chance that you
might later change the 15% delta-PSI cutoff, you would have to rerun the
stats with --cstat <newcutoff> if you are using --cstat to enforce the
cutoff. With the default settings, you need to filter the results again,
and you can try different cutoff to see how they affect the results. But
if 15% is some "magic" threshold that will never change, the statistics
will probably do a better job testing for it with --cstat 0.15.

I would be curious which approach gives you more significant genes (it
would certainly affect the p-values, and those in turn are adjusted
through the multiple testing correction), so if you decide to try both,
please report back!

Thanks,

Thomas

> --
> You received this message because you are subscribed to the Google Groups "rMATS User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to rmats-user-gro...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/rmats-user-group/32ffdd20-0fc1-4902-82ac-6db9cc237d5en%40googlegroups.com.
>

Jakub

unread,

May 6, 2022, 12:54:16 PM5/6/22

to rMATS User Group

Dear Alix,

Yes, you are probably right that hypothesis testing should ideally include the PSI threshold - however relatively small sample sizes and biological variability often mean that this results in a very small result set, which may miss plausible hits/true positives.

Fixed thresholds have been around for a long time, many people still use fixed log2FC thresholds in DESeq2 (using this as many people are familiar with it). The 'correct' way in DESeq2 would be to use the threshold-based Wald test: results(dds, lfcThreshold=.5, altHypothesis="greaterAbs") which is buried in the appendix and I've rarely seen it used.

I think the best way to look at is thaty you set the cstat low (I use 0.01), to capture any differentially spliced events, but then set a fixed IncLevelDifference filter to make sure you do not capture events which are unlikely to be biologically meaningful (I certainly don't think a deltaPSI of 0.01 has any biological relevance in most cases). You will not have statistical certainty over the magnitude of your change but at least this gives you a rough ideal. This is a compromise I'm happy with having looked at the output.

The other thing to think about is that both these approaches still let through a number of results with very low coverage: e.g. 1,1,1,0,0,4 0,0,0,0,0,0, which you may wish to filter separately.

Best wishes,

Jakub

Alix

unread,

May 11, 2022, 8:23:39 AM5/11/22

to rMATS User Group

Dear Eric, Thomas and Jakub,

Thanks a lot for your insight, I will probably try several parameters setups and see what the output is, but everything is much clearer now!