rMATS not annotating a Retained Intron event from GTF file

16 views
Skip to first unread message

Sayantan Laha

unread,
Aug 18, 2025, 10:06:55 AMAug 18
to rMATS User Group
Hi Eric and Group,
                                I had performed a differential splicing analysis using the latest rMATS. I have 4 groups (3 replicates per group) namely: WT_HS, KO_HS, WT_NHS, KO_NHS. The steps pertaining to the analysis was done as described in the manual (Briefly: alignment with STAR and extracting the unique reads).

This is the command I ran post alignment (showing for comparison between WT_NHS and WT_HS):

rmats.py \
--b1 WT_NHS.txt \
--b2 WT_HS.txt \
--gtf /home/admin1/Analysis/ref/human/Ensembl_GRCh38/Homo_sapiens.GRCh38.114.gtf \
-t paired \
--libType fr-firststrand \
--readLength 50 \
--variable-read-length \
--nthread 4 \
--individual-counts \
--od /home/admin1/Analysis/BRS-08072025-TR/rMATS/WT_NHS_vs_WT_HS \
--tmp /home/admin1/Analysis/BRS-08072025-TR/rMATS/WT_NHS_vs_WT_HS_temp \
--task both

I did not face any issues during the analysis. However, upon inspecting the results, this is what I observed.

In the case of retained intron, for the gene BAG3, I find that there are no RI events detected in the fromGTF.RI.txt file and neither in the fromGTF.novelSpliceSite.RI.txt file. I checked the BAM files for BAG3 gene and I found that BAG3 does indeed have reads that mapped to the introns for the groups WT_HS and KO_HS (Image attached). I am confused as to why rMATS did not even detect this splicing event from the gtf file that I used.

I searched online for a possible explanation and this is what I found from an earlier post to which Eric had replied:

"That looks like a retained intron event which rmats will only report if it is (mostly) annotated in the gtf file. rmats is not designed to detect unannotated RI event."

So, what I am failing to understand from here is what the statement "annotated in the gtf file" actually means. I thought that the gtf file should contain the keyword "retained_intron" explicitly in the gtf file against the biotype for a particular gene/transcript for rMATS to consider it. However, this does not seem to be the case because I have found another gene without the term "retained_intron" against it in the gtf file detected in the category of RI.

For your reference, I am attaching the IGV image of the bam files that I have described. Additionally I am attaching the lines of the gtf file for the gene in question "BAG3". I would be grateful if you could help me understand the reason for this ambiguity.

Regards,
Sayantan

BAG3.gtf
BAG3.png

kutsc...@gmail.com

unread,
Aug 19, 2025, 9:32:07 AMAug 19
to rMATS User Group
Here's the post you mentioned: https://github.com/Xinglab/rmats-turbo/issues/17

That post describes what rMATS requires to detect a retained intron event. It needs the two exons from the skipping isoform (an upstream exon and a downstream exon) and the junction between those two exons. It also needs an exon which goes from the upstream exon start to the downstream exon end.

The IGV image shows that there is read coverage in the intron regions. rMATS can count those reads toward the inclusion isoform of a retained intron event that it already detected, but those reads won't help rMATS detect the event.

rMATS just uses the lines from the gtf with exon as the third column. It uses the coordinates from that line and also the gene_id, transcript_id, and gene_name attributes. You could add an exon line to get rMATS to detect a retained intron event. The file you attached has a transcript with these two consecutive exons: (119651380, 119651855), (119669851, 119670177). You could add this line to the gtf to get rMATS to detect a retained intron event:

10 ensembl_havana exon 119651380 119670177 . + . gene_id "ENSG00000151929"; transcript_id "ri_1"; gene_name "BAG3";

Eric
Reply all
Reply to author
Forward
0 new messages