Issue with rMATS Statistical Model Not Producing P-values/FDR

Michael Borja

unread,

Jan 29, 2025, 6:48:14 AMJan 29

to rMATS User Group

Hi rMATS group!

I'm running to an issue where rMATS fails to generate P-values and FDR values when I'm running the statistical model. What is strange is that when I'm using the --statoff flag, rMATS successfully processes differential splicing events, and the output files contain actual splicing data. However, when running without --statoff, the output files are empty, with no statistical results.

Details of My rMATS Runs:
- rMATS version: v4.3.0
- Script used with stats:
rmats.py --b1 /path/to/young_bams.txt \
--b2 /path/to/old_bams.txt \
--gtf /path/to/gencode.vM36.annotation.gtf \
--od /path/to/rmats_output \
--tmp /path/to/rmats_tmp \
--readLength 101 \
--nthread 8

I honestly am not confident in trying to diagnose why this is the case so I was wondering if this is a common issue or I'm missing something important. What does the statistical testing require in order to generate p-values and FDR? I have 18 samples for group 1 and 14 samples for group 2. Does sample size have to be equal? Anyways, more than happy to discuss further. Thank you so much!

Michael Borja
2nd year PhD student
UC Santa Cruz

kutsc...@gmail.com

unread,

Jan 29, 2025, 12:43:03 PMJan 29

to rMATS User Group

The sample size does not need to be the same in groups 1 and 2

When the statistical model is run, there is a filter requiring each splicing event to have reads supporting each sample group and each isoform: https://github.com/Xinglab/rmats-turbo/issues/96#issuecomment-847847715

If --statoff is used then that filter is not applied and the output can include splicing events that were found just based on the --gtf. Since you have output data only when running with --statoff, my guess is that rMATS filtered out most of the reads for at least 1 of the sample groups. rMATS will show reasons why reads are filtered out like in https://github.com/Xinglab/rmats-turbo/issues/328#issuecomment-1757727653

There's also a file output in the --tmp directory with a name like [datetime]_read_outcomes_by_bam.txt that has the read filter counts for each sample. Hopefully that file will show what the issue is

Eric

Michael Borja

unread,

Jan 29, 2025, 6:53:54 PMJan 29

to rMATS User Group

Ok thanks to your links to related conversations with similar problems, i realized that I forgot my bam files have variable readLength and hence it didn't like that. I tried the script with a --variable-read-length flag this time and it totally worked! Now i'm going through the individual splicing modes and filter for significance. Thank you so much!

Mike

Reply all

Reply to author

Forward