[nf-core] Statistical models and contrasts

Uri David Akavia

unread,

Jan 27, 2026, 6:44:10 AMJan 27

to rMATS User Group

Hi

I'm trying to see if rMATS could fit into a pipeline like https://nf-co.re/differentialabundance/1.5.0/docs/usage/ in nf-core.

This pipeline uses contrasts, which are defined in multiple ways

1. Contrasts file

https://nf-co.re/differentialabundance/1.5.0/docs/usage/#contrasts-file

My understanding that if it is defined this way, I can use the contrasts and comparison to identify the groups, and then compare both of them.

If that's the case, and I have 2 groups (WT/KO) with 3 samples each WT1, WT2, WT3, KO1, KO2, KO3

I should run something like

rmats.py --task prep -b1 WT1_bam.txt

... rmats.py --task prep -b1 KO3_bam.txt

rmats.py --task post -b1 WT1_bam.txt --statoff

..rmats.py --task post -b1 WT1_bam.txt --statoff

Then unify the groups (using https://github.com/Xinglab/rmats-turbo/blob/v4.3.0/cp_with_prefix.py??) and run

rmats.py --task stats -b1 group1_bam.txt -b2 group2_bam.txt

I should NOT use the paired statistical model in this case.

The rMATS model (as I understand it) doesn't support blocking, so just ignore that bit in modeling?

Have I got this correctly?

2. Formulas, which are more complicated

Formulas can have lots of interaction terms

https://bioconductor.org/packages/release/workflows/vignettes/RNAseq123/inst/doc/designmatrices.html

Is there anyway to do a statistical analysis based on formulas? Is that something you might be considering in the future?

Thank you,

Uri David

kutsc...@gmail.com

unread,

Jan 28, 2026, 12:31:52 PMJan 28

to rMATS User Group

Running prep and post steps with two groups is described at: https://github.com/Xinglab/rmats-turbo/tree/v4.3.0?tab=readme-ov-file#running-prep-and-post-separately

I would recommend running each sample in a separate prep step:
rmats.py --task prep --b1 WT1_bam.txt
...
rmats.py --task prep --b1 KO3_bam.txt

Then use cp_with_prefix.py to copy all the .rmats files from the different prep --tmp directories to a single --tmp directory to be used in the post step

The post step will run the statistical test unless --statoff is given. If you want, you could run --task post --statoff followed by a run of --task stat. Or you could just let the post step run the statistical test

I'm not familiar with the "blocking" values or the formulas that you linked to, but rMATS doesn't accept any additional description of the individual samples besides the grouping into --b1 and --b2

Eric

Uri David Akavia

unread,

Jan 29, 2026, 11:02:41 AMJan 29

to kutsc...@gmail.com, rMATS User Group

Hi

Thank you very much for your helpful replies.

To clarify, if I have multiple comparisons, let's say phenotype A, B, C

I can run prep on each sample A1, A2, ...., C5 separately

It is better to run post on ALL the samples in one task and then follow the instructions to run stat

Once A vs B

Once A vs C

Using the group indices

Is this correct?

Second question

In this situation, I should not use the paired statistical model. Did i get thst right?

Thank you

Uri David

--
You received this message because you are subscribed to a topic in the Google Groups "rMATS User Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rmats-user-group/eVIEXqWhNME/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rmats-user-gro...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/rmats-user-group/0f91c9cb-189b-4715-986a-08bb9ef1e7efn%40googlegroups.com.

kutsc...@gmail.com

unread,

Jan 29, 2026, 3:29:02 PMJan 29

to rMATS User Group

Yes, you can run the prep step on each sample separately, then run a single post step with all the samples, and then run a stat step for each comparison: https://github.com/Xinglab/rmats-turbo/issues/96

Running all samples together in a single post step lets rMATS use all the reads to detect as many events as possible. That event set will be used for each comparison which makes it easy to compare results across comparisons (without needing to look for events with the same coordinates)

The default statistical model (non paired) works for any two groups. The paired model requires that each sample in group 1 has a matched pair in group 2: https://github.com/Xinglab/rmats-turbo/issues/356#issuecomment-1889275667

Eric

Reply all

Reply to author

Forward