Technical Question – rMATS with Genotype-Specific Custom References

15 views
Skip to first unread message

Odudu James

unread,
Apr 28, 2026, 5:13:50 PM (14 days ago) Apr 28
to rMATS User Group
Dear rMATS development team,

I am writing to seek technical guidance on whether rMATS is appropriate for my experimental setup and, if so, how to best configure it.

**Experimental context:**
I am studying alternative splicing at intron 1 of TP53 across 4 genotypes:
- Alu_free (reference level for comparisons)
- H_sense (transgene inserted in sense orientation)
- H_antisense (transgene inserted in antisense orientation)
- H_intron (wild type)

Each genotype has 3 biological replicates (H_sense has 4), giving 13 BAM files total. All BAM files were generated using STAR alignment.

**The core issue:**
Each genotype was aligned to its own custom reference genome and GTF file. The custom references were built by taking a ~90 kb region of chromosome 17 encompassing TP53 plus flanking genes, reverse complementing it into a mini-chromosome (since TP53 is on the antisense strand), and soft-masking the remainder of the reference annotation. In the non-wild type genotypes, intron 1 of TP53 was replaced with a modified sequence (transgene), which changes the length of intron 1 and shifts all downstream exon coordinates by a genotype-specific offset. This means each genotype's BAM files exist in a different coordinate space.

**My questions:**
1. Can rMATS handle pairwise comparisons where the two groups were aligned to different reference genomes with different coordinate systems? If not, is remapping all samples to a single unified reference the only solution?

2. If a unified reference is required, what would you recommend as the best strategy for constructing it given that the genotypes differ only within intron 1 of TP53?

3. Would rMATS be able to detect novel splice sites created by the transgene insertion, or is it limited to splice events present in the provided GTF?

4. Is there a recommended way to restrict the rMATS analysis to a specific genomic region (intron 1 of TP53 and its flanking exons) rather than running it genome-wide?

Thank you very much for your time. I am happy to provide additional details about the reference construction or file structure if helpful.

kutsc...@gmail.com

unread,
Apr 29, 2026, 3:38:55 PM (13 days ago) Apr 29
to rMATS User Group
rMATS assumes a single coordinate system. I don't think there is an easy way for rMATS itself to handle two groups where the genome region of interest has a large change. If you were to align everything to the same reference sequence then I expect that many of the reads would be aligned with insertions or deletions or fail to align. rMATS actually filters out alignments with insertions or deletions

rMATS can detect novel splice sites if it's run with --novelSS

You can restrict rMATS to a specific region by using a --gtf that only includes the exons in the region of interest. Any alignments which don't overlap an exon from the --gtf will be filtered out

One thing you could try is processing each group separately using the specific reference genome and GTF for that group for both aligning the reads and running rMATS. Then you could attempt to match up events from the output files like SE.MATS.JC.txt based on some translation of the coordinates. For each reference genome you could translate the coordinate columns in the rMATS output files to be offsets from the intron that differs. The coordinate number could be converted to something like {offset}_before, {offset}_within, or {offset}_after depending on whether the coordinate is before, within, or after the intron

If you do manage to find the same event in two groups then you can run the rMATS statistical model by using the count columns like IJC_SAMPLE_1 and the isoform length columns like IncFormLen. You can run rMATSexe directly on a file with the counts like in this post: https://groups.google.com/g/rmats-user-group/c/2PJ6DWFu1m8/m/0J0eY3XlAAAJ

Eric
Reply all
Reply to author
Forward
0 new messages