Understanding novel splice sites

750 views
Skip to first unread message

Paula Navarrete

unread,
Sep 24, 2021, 5:31:24 AM9/24/21
to rMATS User Group
Hi all,

I have performed an rMATS analysis and I would like to understand on the one hand how does it detect retained introns and on the other hand the novel splice site option (--novelSS).
I interpret that the novel splice site events contain at least one novel exon which is based on an annotated exon from the GTF, but with either the upstream or downstream splice site at a different position than what is annotated in the GTF. However, I don't understand how does it detect novel exons and how does it differentiate them from retained introns.

I obtained an enrichment of differential retained introns in my  analysis but I wonder if these detected introns are in fact novel exons, because when I visualize in IGV my RNA-seq data, wild-type samples contain reads spanning a certain region between exons (which would be considered an intron) but not spanning the whole intron region. Therefore, I would like to know how does rMATS identifies differential intron retention as well as novel exons, when using --novelSS.

Thank you in advance,

Paula

kutsc...@gmail.com

unread,
Sep 24, 2021, 12:36:26 PM9/24/21
to rMATS User Group
Here is a description of how rMATS detects retained intron events: https://github.com/Xinglab/rmats-turbo/issues/17#issuecomment-650224207

For novelSS, rMATS starts with the exons and splice junctions defined by transcripts in the GTF file. When rMATS processes a read from a BAM file, if that read contains a splice junction which is not in the GTF but the two endpoints of the junction match with exons in the GTF, then rMATS will define a novel junction going between those two exons. If rMATS is run with --novelSS, when it processes a read from a BAM file with a splice junction which is not in the GTF but one endpoint of the junction matches with an exon in the GTF, then rMATS will consider the other exons that are annotated as being in a transcript with the first exon. If an exon is found that can have one of its splice sites adjusted to form a novel exon that fits the junction from the read, then rMATS will use that novel exon. Novel exons created this way are only used if their length is at most --mel (max exon length: default 500) and the junction used to form the novel exon is at least --mil (min intron length: default 50)

rMATS relies on the GTF for annotated retained introns. Even with --novelSS I don't think rMATS will detect a retained intron event unless the intron is annotated as an exon

You described a situation where there are reads in only a portion of the region between two exons and those reads are really from a novel exon. I think that rMATS will only count those reads toward a retained intron event if the intron is annotated as an exon in the GTF. Also, rMATS will only detect those reads as being part of a novel exon if there is some other annotated exon which shares at least 1 splice site with the novel exon. It could be that the GTF incorrectly marked the region as a retained intron and so the rMATS results also incorrectly include the retained intron event

Eric
Reply all
Reply to author
Forward
0 new messages