Meta-analysis - accounting for read length and depth

16 views
Skip to first unread message

Phillip West

unread,
Oct 22, 2025, 6:42:09 PMOct 22
to rMATS User Group
Hi there, I am using rMATS to perform alternative splicing meta-analysis on two published datasets. I have performed the post run using a -fixed-event-set to ensure that I measure the same events across the different datasets.

I did have a question about how to best compare the datasets. One dataset has an average of ~55 mill reads/sample with 2*75 bp reads, while the other dataset has an average of ~225 mill reads/sample with 2*150 bp reads. I am wondering if anyone has any suggestions on how to account for read depth and length in my analysis - should I downsample the larger dataset to a similar depth? Or is there a way that rMATS can account for coverage? Also, how should I account for the difference in read length?

Any advice would be appreciated.

Thanks, Phillip

kutsc...@gmail.com

unread,
Oct 24, 2025, 8:59:20 AMOct 24
to rMATS User Group
Here are some posts that discuss read length and depth in rMATS:
https://groups.google.com/g/rmats-user-group/c/hpr7j9FMgFg/m/M9qbjIYJAgAJ
https://groups.google.com/g/rmats-user-group/c/hgEOH_b5Pr0/m/KqyKUfevAAAJ
https://github.com/Xinglab/rmats-turbo/issues/83

If you process each dataset separately, you can use --fixed-event-set to run each dataset with the appropriate --readLength value. rMATS uses the read length as part of the IncLevel (PSI value) calculation: https://github.com/Xinglab/rmats-turbo/issues/349
Because of the normalization, the PSI values should be reasonable to compare across datasets

Instead, if you run both datasets in the same post step, then the rMATS statistical model will consider the read counts when checking for significant splicing events. The higher read depth would generally lead to higher confidence in the PSI value for that group and lead to more significant pvalues. When running both datasets together, a single read length has to be provided and the PSI value calculation won't be ideal

You could try downsampling the higher read depth dataset. You could also try truncating each read in the longer read length dataset. Reducing to the minimum value might make the datasets more comparable

Eric
Reply all
Reply to author
Forward
0 new messages