Meta-analysis - accounting for read length and depth

16 views

Skip to first unread message

Phillip West

unread,

Oct 22, 2025, 6:42:09 PMOct 22

to rMATS User Group

Hi there, I am using rMATS to perform alternative splicing meta-analysis on two published datasets. I have performed the post run using a -fixed-event-set to ensure that I measure the same events across the different datasets.

I did have a question about how to best compare the datasets. One dataset has an average of ~55 mill reads/sample with 2*75 bp reads, while the other dataset has an average of ~225 mill reads/sample with 2*150 bp reads. I am wondering if anyone has any suggestions on how to account for read depth and length in my analysis - should I downsample the larger dataset to a similar depth? Or is there a way that rMATS can account for coverage? Also, how should I account for the difference in read length?

Any advice would be appreciated.

Thanks, Phillip

kutsc...@gmail.com

unread,

Oct 24, 2025, 8:59:20 AMOct 24

to rMATS User Group

Here are some posts that discuss read length and depth in rMATS:
https://groups.google.com/g/rmats-user-group/c/hpr7j9FMgFg/m/M9qbjIYJAgAJ
https://groups.google.com/g/rmats-user-group/c/hgEOH_b5Pr0/m/KqyKUfevAAAJ
https://github.com/Xinglab/rmats-turbo/issues/83

https://groups.google.com/g/rmats-user-group/c/5FGJheixvWw/m/dH1dT2s2AAAJ

If you process each dataset separately, you can use --fixed-event-set to run each dataset with the appropriate --readLength value. rMATS uses the read length as part of the IncLevel (PSI value) calculation: https://github.com/Xinglab/rmats-turbo/issues/349
Because of the normalization, the PSI values should be reasonable to compare across datasets

Instead, if you run both datasets in the same post step, then the rMATS statistical model will consider the read counts when checking for significant splicing events. The higher read depth would generally lead to higher confidence in the PSI value for that group and lead to more significant pvalues. When running both datasets together, a single read length has to be provided and the PSI value calculation won't be ideal

You could try downsampling the higher read depth dataset. You could also try truncating each read in the longer read length dataset. Reducing to the minimum value might make the datasets more comparable

Eric

Reply all

Reply to author

Forward

0 new messages